-
Notifications
You must be signed in to change notification settings - Fork 31.1k
Fix TypicalLogitsWarper tensor OOB indexing edge case #26579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm going to be honest - I don't fully understand the code here! The array This is the sum of a boolean array, which should be strictly nonnegative, because boolean arrays only contain 0 and 1 values. Therefore, I don't understand why the original line Do you know why this is necessary in the first place? |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
|
@Rocketknight1 good point, I hadn't considered whether the existing check was valid/redundant, I guess it's always been there. I guess it would make more sense if the prior line was |
|
I guess this will actually mean a subtle change in behaviour, but I'm fairly sure it's what was originally intended. Not sure whether this is ok though w.r.t. transformers policies around this kind of thing... |
|
cc @gante here - I think you might know better than me what the code is doing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @njhill!
This makes sense -- the last valid index of a sorted vector with N valid values is N-1
Since this is a bug fix, we don't need a deprecation cycle :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| last_ind.clamp_(0) | |
| last_ind.clamp_(min=0) |
nit: let's make it more clear that it is a minimum clamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand how negative values get in there, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rocketknight1 After the change, the (cumulative_probs < self.mass).sum(dim=1) can be 0, which added to -1 turns into a negative value. The cumsum vector at index N already contains the value of the tensor at index N, so if the first member of cumulative_probs is larger than self.mass a negative value will come out of here :D
Before the change, it seemed to be a redundant op, yes 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, lol. Just making sure I wasn't missing something incredibly obvious!
This can be triggerd fairly quickly with low precision e.g. bfloat16 and typical_p = 0.99.
6ad5ee6 to
0094d0e
Compare
|
Thanks @gante @Rocketknight1, I've now rebased and added a commit with the explicit |
|
Thank you for the fix @njhill 💪 |
A recent PR huggingface#26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.
A recent PR #26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.
A recent PR huggingface#26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.
This can be triggered fairly quickly with low precision e.g.
bfloat16andtypical_p=0.99.@gante