fix: Fix typical_p behaviour broken in recent change #27165

njhill · 2023-10-31T00:55:17Z

A recent PR #26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff.

However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently.

Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so last_ind should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass).

We still need a max clamp in case that last token is the very last one in the tensor.

A recent PR huggingface#26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.

njhill · 2023-10-31T00:56:44Z

@gante sorry about this! I observed that it can actually make a significant difference to the output when typical_p is used.

gante

Makes sense, after re-reading the docstring (which I should also have read when reviewing!). Thank you for the fix!

gante · 2023-10-31T12:55:01Z

(the CI failed for unrelated reasons, rerunning failed jobs)

A recent PR huggingface#26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.

amyeroberts requested a review from gante October 31, 2023 09:28

gante approved these changes Oct 31, 2023

View reviewed changes

gante merged commit 3cd3eaf into huggingface:main Oct 31, 2023

njhill deleted the typical_p_regression branch October 31, 2023 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Fix typical_p behaviour broken in recent change #27165

fix: Fix typical_p behaviour broken in recent change #27165

Uh oh!

njhill commented Oct 31, 2023

Uh oh!

njhill commented Oct 31, 2023

Uh oh!

gante left a comment

Uh oh!

gante commented Oct 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Fix typical_p behaviour broken in recent change #27165

fix: Fix typical_p behaviour broken in recent change #27165

Uh oh!

Conversation

njhill commented Oct 31, 2023

Uh oh!

njhill commented Oct 31, 2023

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

gante commented Oct 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants