[Attention] Refactor FA `block_size` limitations to hybrid models only #29084

NickLucche · 2025-11-20T10:41:01Z

This PR limits the blocks-size changes introduced in #27753 to hybrid-models only.
As non hybrid-models are not affected by the issues reported in the PR above, not allowing a supported physical block_size can hinder performance of kv cache transfers.

Eg on the likes of PD disaggregation with NIXL, we're not limited by bw so reducing the size of blocks reduces optimal saturation of medium.

To do so I have soft-reverted some of the changes introduced in the #24794 refactor, regarding supported_kernel_block_size being a class attribute rather than a staticmethod. Happy to discuss other options, but I found I couldn't have a cleaner solution with classvars.

cc @tdoublep for hybrid models
cc @MatthewBonanni for attn metadata refactor

Test

Reproducing same setup as for hybrid models.

# after PR (same as without) FA backend
Prompt: 'Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.\n\nConsider the paths of length $16$ that follow the lines from the lower left corner to the upper right corner on an $8\\times 8$ grid. Find the number of such paths that change direction exactly four times, as in the examples shown below.\n\nRemember to put your answer on its own line after "Answer:".'
Generated text: '</think>\nThe problem requires finding the number of paths of length 16'
Generated token IDs: [1885, 74045, 1561, 1784, 4127, 10867, 13170, 1278, 2782, 1307, 22344, 1307, 5592, 1032, 1049, 1054]
Generated text: ' \n\nExample paths:\n- Path 1: Right, Right, Down,'
Generated token IDs: [1032, 1267, 20396, 22344, 1877, 1045, 17669, 1032, 1049, 1058, 21285, 1044, 21285, 1044, 16999, 1044]
Generated text: '</think>\nThe number of paths of length 16 from the lower left'
Generated token IDs: [1885, 74045, 1561, 1784, 2782, 1307, 22344, 1307, 5592, 1032, 1049, 1054, 1562, 1278, 4953, 3979]
Generated text: '</think>\nThe problem involves finding the number of paths of length 16'
Generated token IDs: [1885, 74045, 1561, 1784, 4127, 19263, 13170, 1278, 2782, 1307, 22344, 1307, 5592, 1032, 1049, 1054]
Generated text: ' output: To solve the problem of finding the number of paths of length 1'
Generated token IDs: [4848, 1058, 3870, 15047, 1278, 4127, 1307, 13170, 1278, 2782, 1307, 22344, 1307, 5592, 1032, 1049]
Generated text: ' \n</think>\nThe problem requires finding the number of paths of length '
Generated token IDs: [1032, 1010, 1885, 74045, 1561, 1784, 4127, 10867, 13170, 1278, 2782, 1307, 22344, 1307, 5592, 1032]
Generated text: '</think>\nThe problem requires finding the number of paths of length 16'
Generated token IDs: [1885, 74045, 1561, 1784, 4127, 10867, 13170, 1278, 2782, 1307, 22344, 1307, 5592, 1032, 1049, 1054]
Generated text: ' \n</think>\nThe paths of length 16 from the lower left'
Generated token IDs: [1032, 1010, 1885, 74045, 1561, 1784, 22344, 1307, 5592, 1032, 1049, 1054, 1562, 1278, 4953, 3979]
Generated text: " \n\n<think>\nOkay, let's see. I need to find the"
Generated token IDs: [1032, 1267, 49250, 2077, 1561, 44053, 1044, 2878, 1681, 3219, 1046, 1362, 2534, 1317, 3081, 1278]
Generated text: '<think>\nOkay, so I need to find the number of paths on an'
Generated token IDs: [1060, 74045, 1561, 44053, 1044, 1878, 1362, 2534, 1317, 3081, 1278, 2782, 1307, 22344, 1408, 1420]

NickLucche · 2025-11-20T10:43:33Z

vllm/v1/attention/backends/flashinfer.py

            )

    @classmethod
-    @override


pre-commit rightfully complaining as this is not an ovveride

gemini-code-assist

Code Review

This pull request refactors how supported block sizes for attention backends are determined, changing from a class variable to a static method. This allows for dynamic block size support, which is used here to apply specific block_size limitations only to hybrid models using the FlashAttention backend. The changes are consistent across all affected backend implementations and tests.

I've found one critical issue in vllm/v1/attention/backends/flashinfer.py where a method signature mismatch with its base class will cause a TypeError at runtime. Please see the detailed comment.

vllm/v1/attention/backends/flashinfer.py

mergify · 2025-11-20T11:00:09Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

tdoublep

Thanks for doing this!

vllm/v1/attention/backends/flash_attn.py

MatthewBonanni

Thanks! I have no objections to making this a static method.

vllm/attention/backends/abstract.py

NickLucche · 2025-11-20T15:11:04Z

Addressed your suggestions, thanks for the quick review @tdoublep @MatthewBonanni

tdoublep

LGTM

Signed-off-by: NickLucche <[email protected]>

NickLucche requested review from LucasWilkinson, WoosukKwon, alexm-redhat, mgoin, njhill, pavanimajety, tdoublep, tjtanaa, youkaichao and zhuohan123 as code owners November 20, 2025 10:41

mergify bot added nvidia rocm Related to AMD ROCm v1 labels Nov 20, 2025

github-project-automation bot added this to NVIDIA Nov 20, 2025

NickLucche commented Nov 20, 2025

View reviewed changes

gemini-code-assist bot reviewed Nov 20, 2025

View reviewed changes

vllm/v1/attention/backends/flashinfer.py Show resolved Hide resolved

mergify bot added the needs-rebase label Nov 20, 2025

NickLucche force-pushed the fa-block-size-revert branch from 35a7704 to 351e050 Compare November 20, 2025 12:25

mergify bot removed the needs-rebase label Nov 20, 2025

tdoublep reviewed Nov 20, 2025

View reviewed changes

vllm/v1/attention/backends/flash_attn.py Outdated Show resolved Hide resolved

MatthewBonanni approved these changes Nov 20, 2025

View reviewed changes

vllm/attention/backends/abstract.py Outdated Show resolved Hide resolved

NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 20, 2025

tdoublep approved these changes Nov 20, 2025

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Nov 20, 2025

NickLucche added 4 commits November 21, 2025 12:12

init

9799e67

Signed-off-by: NickLucche <[email protected]>

further limit hybrid model check

353f173

Signed-off-by: NickLucche <[email protected]>

review

975664d

Signed-off-by: NickLucche <[email protected]>

re-add model_config check for tests

1d3a4b3

Signed-off-by: NickLucche <[email protected]>

NickLucche force-pushed the fa-block-size-revert branch from 14d82fc to 1d3a4b3 Compare November 21, 2025 12:13

Merge branch 'main' into fa-block-size-revert

4957f24

vllm-bot merged commit 066209a into vllm-project:main Nov 22, 2025
49 of 52 checks passed

github-project-automation bot moved this from In review to Done in NVIDIA Nov 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Attention] Refactor FA `block_size` limitations to hybrid models only #29084

[Attention] Refactor FA `block_size` limitations to hybrid models only #29084

NickLucche commented Nov 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

NickLucche Nov 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mergify bot commented Nov 20, 2025

Uh oh!

tdoublep left a comment

Uh oh!

Uh oh!

MatthewBonanni left a comment

Uh oh!

Uh oh!

NickLucche commented Nov 20, 2025

Uh oh!

tdoublep left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Attention] Refactor FA block_size limitations to hybrid models only #29084

[Attention] Refactor FA block_size limitations to hybrid models only #29084

Conversation

NickLucche commented Nov 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test

Uh oh!

NickLucche Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Nov 20, 2025

Uh oh!

tdoublep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MatthewBonanni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NickLucche commented Nov 20, 2025

Uh oh!

tdoublep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Attention] Refactor FA `block_size` limitations to hybrid models only #29084

[Attention] Refactor FA `block_size` limitations to hybrid models only #29084

NickLucche commented Nov 20, 2025 •

edited by github-actions bot

Loading