Fix flash_attention.py: wrong argument passing for attn_implementation #41347

TKONIY · 2025-10-04T23:51:05Z

What does this PR do?

The name of the attn type argument for _flash_attention_forward() should be implementation, instead of attn_implementation which currently uses in the function call. This would result in wrong type specification.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

CC

@vasqu @ArthurZucker @Cyrilvallez

Cyrilvallez · 2025-10-06T09:18:05Z

Oh good catch there is indeed the wrong kwarg being passed on! I would prefer if you can change the name of the kwarg in _flash_attention_forward though, it would be more explicit!

TKONIY · 2025-10-06T11:26:17Z

Oh good catch there is indeed the wrong kwarg being passed on! I would prefer if you can change the name of the kwarg in _flash_attention_forward though, it would be more explicit!

Thank you. I have changed that. Please check.

vasqu

Can you also fix the docs naming then at

transformers/src/transformers/modeling_flash_attention_utils.py

Lines 567 to 568 in 0452f28

    
                   implementation (`str`, *optional*): 
        
                       The attention implementation to use. If None, will default to the one based on the environment.

LGTM otherwise 🤗

The name of the attn type argument for `_flash_attention_forward()` should be `implementation`, instead of `attn_implementation` which currently uses in the function call. This would result in wrong type specification.

TKONIY · 2025-10-06T13:29:02Z

Can you also fix the docs naming then at

transformers/src/transformers/modeling_flash_attention_utils.py

Lines 567 to 568 in 0452f28

implementation (`str`, *optional*):

The attention implementation to use. If None, will default to the one based on the environment.

LGTM otherwise 🤗

Thank you! Done!

Cyrilvallez

Thanks!! 🤗

ArthurZucker · 2025-10-06T17:54:02Z

https://github.com/huggingface/transformers/blob/update-from-pretrained/src/transformers/integrations/hub_kernels.py#L214-L214 is where we use implementation. If you do this it won't fallback to kernels, we need to make sure we use the one passed in load and register

TKONIY · 2025-10-06T17:57:19Z

https://github.com/huggingface/transformers/blob/update-from-pretrained/src/transformers/integrations/hub_kernels.py#L214-L214 is where we use implementation. If you do this it won't fallback to kernels, we need to make sure we use the one passed in load and register

So would it be better if I rollback to the fix that simply change attn_implementation= to implementation=？

vasqu · 2025-10-06T18:09:18Z

The link is broken, I think Arthur meant

transformers/src/transformers/integrations/hub_kernels.py

Lines 214 to 215 in caa14e7

    
           kernel_function = partial(attention_wrapper, implementation=kernel) 
        
           lazy_import_flash_attention(kernel, force_import=True)

Yes, we should also change the kwarg there, totally forgot there, e.g. kernel_function = partial(attention_wrapper, attn_implementation=kernel) . But it's a bit messier tbh and I don't think we actually use the kwarg much at all except on first call (which would happen if someone does something custom with fa interface we have) --> the forced lazy import should load the correct kernel (checking in a second) and as we already loaded it, we never change it again there.

Edit: Still loads the correct kernel implementation, checked with kernels-community/flash-attn3

TKONIY force-pushed the patch-2 branch from 3f4c53e to 3bf3ae4 Compare October 6, 2025 11:24

vasqu approved these changes Oct 6, 2025

View reviewed changes

TKONIY added 3 commits October 6, 2025 21:28

modify the kwargs inside _flash_attention_forward

c9071b7

fix the doc

849acdd

TKONIY force-pushed the patch-2 branch from d894ee0 to 849acdd Compare October 6, 2025 13:28

fix typo

2af089d

Cyrilvallez approved these changes Oct 6, 2025

View reviewed changes

Cyrilvallez merged commit ae60c77 into huggingface:main Oct 6, 2025
5 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix flash_attention.py: wrong argument passing for attn_implementation #41347

Fix flash_attention.py: wrong argument passing for attn_implementation #41347

TKONIY commented Oct 4, 2025

Uh oh!

Cyrilvallez commented Oct 6, 2025

Uh oh!

TKONIY commented Oct 6, 2025

Uh oh!

vasqu left a comment

Uh oh!

TKONIY commented Oct 6, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

Uh oh!

ArthurZucker commented Oct 6, 2025

Uh oh!

TKONIY commented Oct 6, 2025

Uh oh!

vasqu commented Oct 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

	implementation (`str`, optional):
	The attention implementation to use. If None, will default to the one based on the environment.

Fix flash_attention.py: wrong argument passing for attn_implementation #41347

Fix flash_attention.py: wrong argument passing for attn_implementation #41347

Conversation

TKONIY commented Oct 4, 2025

What does this PR do?

Before submitting

CC

Uh oh!

Cyrilvallez commented Oct 6, 2025

Uh oh!

TKONIY commented Oct 6, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

TKONIY commented Oct 6, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker commented Oct 6, 2025

Uh oh!

TKONIY commented Oct 6, 2025

Uh oh!

vasqu commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vasqu commented Oct 6, 2025 •

edited

Loading