-
Notifications
You must be signed in to change notification settings - Fork 31.2k
fix accelerator prepare during eval only mode #24014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, this changes the whole logic of the evaluation in the Trainer and should not be done.
src/transformers/trainer.py
Outdated
| model = self._wrap_model(self.model, training=False, dataloader=dataloader) | ||
|
|
||
| if len(self.accelerator._models) == 0 and model is self.model: | ||
| model = self.accelerator.prepare(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we only want to do this for DeepSpeed, not all the time. Putting a model in DistributedDataParallel just for evaluation will waste some memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree on the DDP case and hence I didn't update it earlier but as mentioned below we will be missing mixed precision coverage for eval-only mode
|
The thing is that mixed precision application for eval only mode won't work unless we prepare model |
* fix mixed precision prep during eval only mode * update to address comments * update to reflect the changes in accelerate
What does this PR do?
preparemethod is happening only during training loop. If the user is directly doingeval/predictwithout the training loop, the model isn't prepared leading to wrong behaviour. This PR is aimed at fixing it.evaluation_modeaccelerate#1540