-
Notifications
You must be signed in to change notification settings - Fork 31.1k
[Bug fix] Using loaded checkpoint with --do_predict (instead of random init) #3437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Without this fix, I'm getting near-random validation performance for a trained model, and the validation performance differs per validation run. I think this happens since the `model` variable isn't set with the loaded checkpoint, so I'm using a randomly initialized model. Looking at the model activations, they differ each time I run evaluation (but they don't with this fix).
|
Tagging @srush @nateraw from the original Lightning GLUE PR to check I'm not missing something? |
Codecov Report
@@ Coverage Diff @@
## master #3437 +/- ##
==========================================
+ Coverage 77.56% 77.60% +0.04%
==========================================
Files 100 100
Lines 16970 16967 -3
==========================================
+ Hits 13162 13167 +5
+ Misses 3808 3800 -8
Continue to review full report at Codecov.
|
|
I'll check this out later tonight! I'm on mobile so I've just looked at your commit quickly...looks like you're right. I know in the past I've instantiated the model then called |
|
That was fast 😄 Looks good to me! |
|
Thanks for checking :) I'm still not able to reproduce my in-training validation performance though with the --do_predict flag, any ideas? I'm getting identical validation accuracy on different runs now, but the accuracy is still near random |
|
@ethanjperez I just checked the docs, and it looks like the way we were doing it originally was correct. model = MyLightingModule.load_from_checkpoint(PATH)
model.eval()
y_hat = model(x)The way that I was explaining to do it would require you to use I haven't had the chance to recreate the issue, so I'll have to take a look. |
|
Cool thanks! Even with the original way, I was still not able to reproduce my in-training validation performance (just something to look out for when you try) - In particular, I'm loading/running an already trained model with the |
sshleifer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
|
@nateraw @sshleifer Are you guys able to load a trained model successfully with the pytorch-lightning scripts? Even after this patch, I am having issues loading an already trained model, i.e., if I just use |
|
Sorry for taking so long. I will try to reproduce this today if there is no update on your end! Filing an issue with what you ran/expected would help :) @ethanjperez |
|
@sshleifer Just seeing this - were you able to reproduce the issue? I can't remember what exact command I ran, but it was a standard evaluation command (the same as the training command I used, but with a few flags tweaked, e.g. drop the |
|
This is fixed now. |
Without this fix, I'm getting near-random validation performance for a trained model, and the validation performance differs per validation run. I think this happens since the
modelvariable isn't set with the loaded checkpoint, so I'm using a randomly initialized model. Looking at the model activations, they differ each time I run evaluation (but they don't with this fix).