-
Notifications
You must be signed in to change notification settings - Fork 31.1k
Description
System Info
transformers version: 4.43.3
Python 3.10.12
Ubuntu
Issue is with the trainer.py since it does not check the base_model for Peft cases to get the label information
Have fixed in local but raising this to fix in the branch
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Run a vision transformer training using Lora
Expected behavior
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1341, in _save_checkpoint
metric_value = metrics[metric_to_check]
KeyError: 'eval_loss'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/vision/run_image_classification.py", line 510, in
main()
File "/home/vision/run_image_classification.py", line 480, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1052, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, _grad_norm, model, trial, epoch, ignore_keys_for_eval)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1269, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1343, in _save_checkpoint
raise KeyError(
KeyError: "The metric_for_best_model training argument is set to 'eval_loss', which is not found in the evaluation metrics. The available evaluation metrics are: ['eval_runtime', 'eval_samples_per_second', 'eval_steps_per_second', 'epoch', 'memory_allocated (GB)', 'max_memory_allocated (GB)', 'total_memory_available (GB)']. Consider changing the metric_for_best_model via the TrainingArguments."