-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Closed
Labels
Description
System Info
transformersversion: 4.46.2- Platform: Linux-5.14.0-427.22.1.el9_4.x86_64-x86_64-with-glibc2.34
- Python version: 3.11.10
- Huggingface_hub version: 0.26.1
- Safetensors version: 0.4.5
- Accelerate version: 1.1.1
- Accelerate config: not found
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: no
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
(I reduced the code to the relevant parts)
train_args = TrainingArguments(
num_train_epochs=50,
eval_strategy="epoch",
logging_strategy="epoch",
save_strategy="epoch",
save_total_limit=3,
report_to="wandb",
run_name=name,
)
trainer = Trainer(
args=train_args,
train_dataset=train_dataset,
eval_dataset=test_dataset,
)
The issue is when reporting to WanDB, the callback at the following line of code
| fake_trainer = Trainer(args=args, model=model, processing_class=tokenizer) |
creates a fake trainer
fake_trainer = Trainer(args=args, model=model, processing_class=tokenizer)
with the same as the training arguments
but it isn't providing any datasets to the fake trainer
but because my script defines eval_strategy to anything other than no, and because WanDB reporting is defined
it throws the following error at the end of the training
105 File "/home/mazuze/NLP/Hebrew-LLM-Eval/sentence_ordering/train_model.py", line 278, in main
106 trainer.train()
107 File "/home/mazuze/.conda/envs/coherence/lib/python3.11/site-packages/transformers/trainer.py", line 2123, in train
108 return inner_training_loop(
109 ^^^^^^^^^^^^^^^^^^^^
110 File "/home/mazuze/.conda/envs/coherence/lib/python3.11/site-packages/transformers/trainer.py", line 2635, in _inner_training_loop
111 self.control = self.callback_handler.on_train_end(args, self.state, self.control)
112 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
113 File "/home/mazuze/.conda/envs/coherence/lib/python3.11/site-packages/transformers/trainer_callback.py", line 471, in on_train_end
114 return self.call_event("on_train_end", args, state, control)
115 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
116 File "/home/mazuze/.conda/envs/coherence/lib/python3.11/site-packages/transformers/trainer_callback.py", line 518, in call_event
117 result = getattr(callback, event)(
118 ^^^^^^^^^^^^^^^^^^^^^^^^^
119 File "/home/mazuze/.conda/envs/coherence/lib/python3.11/site-packages/transformers/integrations/integration_utils.py", line 919, in on_train_end
120 fake_trainer = Trainer(args=args, model=model, processing_class=tokenizer)
121 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
122 File "/home/mazuze/.conda/envs/coherence/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func
123 return func(*args, **kwargs)
124 ^^^^^^^^^^^^^^^^^^^^^
125 File "/home/mazuze/.conda/envs/coherence/lib/python3.11/site-packages/transformers/trainer.py", line 418, in __init__
126 raise ValueError(
127 ValueError: You have set `args.eval_strategy` to IntervalStrategy.EPOCH but you didn't pass an `eval_dataset` to `Trainer`. Either set `args.eval_strategy` to `no` or pass an `eval_dataset`.
Expected behavior
To not throw an exception and run the "on training end" successfully
dspoka