Skip to content

Conversation

@giganttheo
Copy link
Contributor

@giganttheo giganttheo commented Oct 7, 2023

What does this PR do?

Fixes a typo in the Flax code for T5 model.

There is a typo in the Attention module of the Flax version of T5, where the attention_mask updated by the _concatenate_to_cache method should override the previous attention_mask but does not because of a misnamed variable.

Fixes #26564

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

Who can review?

@sanchit-gandhi

Copy link
Contributor

@sanchit-gandhi sanchit-gandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice @giganttheo! Thanks for identifying the bug and proposing the fix 🤗 Confirming that the slow tests pass following the fix? As per #26564 (comment) If so, then this all LGTM!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@giganttheo
Copy link
Contributor Author

Very nice @giganttheo! Thanks for identifying the bug and proposing the fix 🤗 Confirming that the slow tests pass following the fix? As per #26564 (comment) If so, then this all LGTM!

The slow tests are passing for t5 and longt5:

RUN_SLOW=1 pytest -sv tests/models/t5/test_modeling_flax_t5.py::FlaxT5ModelIntegrationTests

outputs: ================== 6 passed, 4 warnings in 331.38s (0:05:31) ===================

and for the longT5 version:

RUN_SLOW=1 pytest -sv tests/models/longt5/test_modeling_flax_longt5.py::FlaxLongT5ModelIntegrationTests

outputs: =================== 1 passed, 1 warning in 401.61s (0:06:41) ===================

@sanchit-gandhi
Copy link
Contributor

Awesome - thanks for confirming! Requesting a final review from @ArthurZucker

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for catching 🤗

@ArthurZucker ArthurZucker merged commit 975003e into huggingface:main Oct 10, 2023
@giganttheo giganttheo deleted the fix_typo_flax_t5_attn branch May 17, 2024 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flax T5 model - code typo during AutoRegressive decoding?

4 participants