Skip to content

Conversation

@njhill
Copy link
Contributor

@njhill njhill commented Oct 4, 2023

This can be triggered fairly quickly with low precision e.g. bfloat16 and typical_p=0.99.

@gante

@Rocketknight1
Copy link
Member

I'm going to be honest - I don't fully understand the code here!

The array last_ind is created as:
last_ind = (cumulative_probs < self.mass).sum(dim=1)

This is the sum of a boolean array, which should be strictly nonnegative, because boolean arrays only contain 0 and 1 values. Therefore, I don't understand why the original line last_ind[last_ind < 0] = 0 or the replacement using torch.clamp_ are necessary - I don't see how you'd get negative values without an integer overflow, and if we're getting an integer overflow we should be using bigger integer dtypes, not fixing it with value clamping.

Do you know why this is necessary in the first place?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@njhill
Copy link
Contributor Author

njhill commented Oct 6, 2023

@Rocketknight1 good point, I hadn't considered whether the existing check was valid/redundant, I guess it's always been there.

I guess it would make more sense if the prior line was last_ind = (cumulative_probs < self.mass).sum(dim=1) - 1, perhaps that was originally intended but left out. I'll update this PR accordingly.

@njhill
Copy link
Contributor Author

njhill commented Oct 6, 2023

I guess this will actually mean a subtle change in behaviour, but I'm fairly sure it's what was originally intended. Not sure whether this is ok though w.r.t. transformers policies around this kind of thing...

@Rocketknight1
Copy link
Member

cc @gante here - I think you might know better than me what the code is doing!

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @njhill!

This makes sense -- the last valid index of a sorted vector with N valid values is N-1

Since this is a bug fix, we don't need a deprecation cycle :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
last_ind.clamp_(0)
last_ind.clamp_(min=0)

nit: let's make it more clear that it is a minimum clamp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand how negative values get in there, though!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rocketknight1 After the change, the (cumulative_probs < self.mass).sum(dim=1) can be 0, which added to -1 turns into a negative value. The cumsum vector at index N already contains the value of the tensor at index N, so if the first member of cumulative_probs is larger than self.mass a negative value will come out of here :D

Before the change, it seemed to be a redundant op, yes 😂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, lol. Just making sure I wasn't missing something incredibly obvious!

This can be triggerd fairly quickly with low precision e.g. bfloat16 and typical_p = 0.99.
@njhill njhill force-pushed the typical_p_oob_fix branch from 6ad5ee6 to 0094d0e Compare October 18, 2023 15:12
@njhill
Copy link
Contributor Author

njhill commented Oct 18, 2023

Thanks @gante @Rocketknight1, I've now rebased and added a commit with the explicit min arg suggestion.

@gante
Copy link
Member

gante commented Oct 25, 2023

Thank you for the fix @njhill 💪

@gante gante merged commit 0baa924 into huggingface:main Oct 25, 2023
@njhill njhill deleted the typical_p_oob_fix branch October 25, 2023 22:18
i4never pushed a commit to i4never/transformers that referenced this pull request Oct 26, 2023
)

* Fix TypicalLogitsWarper tensor OOB indexing edge case

This can be triggerd fairly quickly with low precision e.g. bfloat16 and typical_p = 0.99.

* Shift threshold index by one

* Use explicit named arg for clamp min
njhill added a commit to njhill/transformers that referenced this pull request Oct 31, 2023
A recent PR huggingface#26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff.

However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently.

Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass).

We still need a max clamp in case that last token is the very last one in the tensor.
gante pushed a commit that referenced this pull request Oct 31, 2023
A recent PR #26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff.

However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently.

Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass).

We still need a max clamp in case that last token is the very last one in the tensor.
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
)

* Fix TypicalLogitsWarper tensor OOB indexing edge case

This can be triggerd fairly quickly with low precision e.g. bfloat16 and typical_p = 0.99.

* Shift threshold index by one

* Use explicit named arg for clamp min
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
A recent PR huggingface#26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff.

However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently.

Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass).

We still need a max clamp in case that last token is the very last one in the tensor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants