Skip to content

Conversation

@njhill
Copy link
Contributor

@njhill njhill commented Oct 31, 2023

A recent PR #26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff.

However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently.

Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so last_ind should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass).

We still need a max clamp in case that last token is the very last one in the tensor.

A recent PR huggingface#26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff.

However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently.

Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass).

We still need a max clamp in case that last token is the very last one in the tensor.
@njhill
Copy link
Contributor Author

njhill commented Oct 31, 2023

@gante sorry about this! I observed that it can actually make a significant difference to the output when typical_p is used.

@amyeroberts amyeroberts requested a review from gante October 31, 2023 09:28
Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, after re-reading the docstring (which I should also have read when reviewing!). Thank you for the fix!

@gante
Copy link
Member

gante commented Oct 31, 2023

(the CI failed for unrelated reasons, rerunning failed jobs)

@gante gante merged commit 3cd3eaf into huggingface:main Oct 31, 2023
@njhill njhill deleted the typical_p_regression branch October 31, 2023 15:16
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
A recent PR huggingface#26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff.

However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently.

Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass).

We still need a max clamp in case that last token is the very last one in the tensor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants