Skip to content

Commit ae581e1

Browse files
committed
Fix attention fusion test numerics
Signed-off-by: Luka Govedič <[email protected]>
1 parent a226864 commit ae581e1

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

tests/compile/test_fusion_attn.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -368,8 +368,9 @@ def test_attention_quant_pattern(
368368
forward_ctx = get_forward_context()
369369
forward_ctx.attn_metadata = model_unfused.build_attn_metadata(batch_size)
370370

371-
# Run model directly without compilation and fusion
372-
result_unfused = model_unfused(q, k, v)
371+
# Run model directly without fusion
372+
# Still compile so query QuantFP8 has closer numerics
373+
result_unfused = torch.compile(model_unfused, fullgraph=True)(q, k, v)
373374

374375
# Run model with attn fusion enabled
375376
vllm_config.compilation_config.pass_config = PassConfig(

0 commit comments

Comments
 (0)