Skip to content

Conversation

ProExpertProg
Copy link
Collaborator

@ProExpertProg ProExpertProg commented Sep 10, 2025

Purpose

This PR enables matching the torch implementations of custom ops QuantFP8 and RMSNorm. On main, fusion currently requires enabling custom ops, but they are slower than their torch counterparts, so the benefit of custom fusion passes is reduced.

We add a bunch of "matcher util" objects which can be called in patterns and get traced to the same fx nodes as the custom op they correspond to in both enabled and disabled form automatically.

This PR also adds additional debugging utilities and adds E2E fusion tests to verify fusions happen in models end-to-end instead of just in unit tests.

Test Plan

Unit tests, added more fusion E2E tests.

Test Result

Tests all pass

Performance numbers

Below are B200 numbers (with flashinfer) from vllm bench serve on the following serve command:

vllm serve redhatai/meta-llama-3.1-70B-Instruct-FP8 --no-enable-prefix-caching --load-format dummy --kv-cache-dtype=fp8 -O.splitting_ops=[] -O.cudagraph_mode=FULL_DECODE_ONLY

We test the following regimes with corresponding additional arguments:

  1. none: -O.custom_ops='["none"]' -O.pass_config={"enable_fi_allreduce_fusion":false,"enable_attn_fusion":false,"enable_noop":true}
  2. none_fusion_attention: -O.custom_ops='["none"]' -O.pass_config={"enable_fi_allreduce_fusion":false,"enable_attn_fusion":true,"enable_noop":true}
  3. none_fusion_attention_allreduce: -O.custom_ops='["none"]' -O.pass_config={"enable_fi_allreduce_fusion":true,"enable_attn_fusion":true,"enable_noop":true}
  4. rms_quant: -O.custom_ops='["none", "+quant_fp8", "+rms_norm"]' -O.pass_config={"enable_fi_allreduce_fusion":false,"enable_attn_fusion":false,"enable_noop":true}
  5. rms_quant_fusion_attention: -O.custom_ops='["none", "+quant_fp8", "+rms_norm"]' -O.pass_config={"enable_fi_allreduce_fusion":false,"enable_attn_fusion":true,"enable_noop":true}
  6. rms_quant_fusion_attention_allreduce: -O.custom_ops='["none", "+quant_fp8", "+rms_norm"]' -O.pass_config={"enable_fi_allreduce_fusion":true,"enable_attn_fusion":true,"enable_noop":true}

2 (none_fusion_attention) and 3 (none_fusion_attention_allreduce) are newly possible with this PR. On main, results are similar except those two are worse as fusion cannot happen without custom ops enabled.

redhatai/meta-llama-3.1-70B-Instruct-FP8 (TP=1):

Past QPS=10 the server is overloaded so the latency spikes and becomes much more variable. Also note that allreduce fusion is a noop for tp=1.

📊 TTFT Median (ms)

Source 1.0 5.0 10.0 15.0 20.0 inf
none 80.24 85.20 148.05 3740.51 27131.90 29992.91
none_fusion_attention_allreduce 78.42 83.89 146.75 1739.83 24117.31 28886.46
rms_quant 80.38 87.31 162.54 5298.40 26009.12 35230.81
rms_quant_fusion_attention_allreduce 79.50 85.38 149.76 3904.91 24737.15 34760.88

📊 TPOT Median (ms)

Source 1.0 5.0 10.0 15.0 20.0 inf
none 16.86 22.06 41.13 287.30 293.71 285.44
none_fusion_attention_allreduce 16.61 21.86 39.95 285.85 284.93 275.04
rms_quant 17.17 23.12 44.32 234.08 232.37 227.34
rms_quant_fusion_attention_allreduce 16.97 22.45 42.21 228.79 228.76 225.60

📊 ITL Median (ms)

Source 1.0 5.0 10.0 15.0 20.0 inf
none 16.11 16.49 21.31 207.46 180.96 64.85
none_fusion_attention_allreduce 15.87 16.28 21.07 223.46 191.32 64.14
rms_quant 16.35 17.02 22.16 103.97 50.63 47.99
rms_quant_fusion_attention_allreduce 16.20 16.71 21.46 143.86 48.40 47.59
serving_metrics_llama70B_tp1

redhatai/meta-llama-3.1-70B-Instruct-FP8 (TP=4):

Note that allreduce fusion reduces TPOT at low QP but increases it at high QPS and increases TTFT across the board, this will be addressed in #24248 and #24252.

📊 TTFT Median (ms)

Source 1.0 5.0 10.0 15.0 20.0 inf
none 72.01 76.32 89.95 112.80 136.13 12852.81
none_fusion_attention 71.10 75.38 93.56 116.45 133.33 12662.96
none_fusion_attention_allreduce 74.16 77.85 94.50 123.77 139.41 21980.02
rms_quant 74.95 78.65 90.93 115.75 162.43 12620.21
rms_quant_fusion_attention 73.77 76.84 94.10 124.59 146.62 20634.74
rms_quant_fusion_attention_allreduce 75.12 77.69 86.11 112.16 199.10 13078.33

📊 TPOT Median (ms)

Source 1.0 5.0 10.0 15.0 20.0 inf
none 9.66 12.72 20.68 31.90 46.11 257.66
none_fusion_attention 9.39 12.44 20.69 29.16 43.08 142.36
none_fusion_attention_allreduce 8.28 11.16 20.84 34.21 47.54 220.19
rms_quant 9.92 13.13 21.95 32.97 62.92 146.71
rms_quant_fusion_attention 9.60 12.78 20.58 34.18 50.37 146.19
rms_quant_fusion_attention_allreduce 8.43 11.39 20.38 31.10 78.54 145.60

📊 ITL Median (ms)

Source 1.0 5.0 10.0 15.0 20.0 inf
none 9.23 9.43 11.45 15.04 63.60 194.04
none_fusion_attention 8.97 9.16 11.29 20.99 62.27 188.26
none_fusion_attention_allreduce 7.85 8.05 11.51 23.63 65.02 191.91
rms_quant 9.47 9.71 11.75 16.26 66.33 188.91
rms_quant_fusion_attention 9.18 9.45 11.51 24.76 68.46 187.65
rms_quant_fusion_attention_allreduce 8.01 8.28 11.60 14.71 73.63 194.55
serving_metrics_llama70B_tp4_all

@ProExpertProg ProExpertProg mentioned this pull request Sep 10, 2025
4 tasks
@mergify
Copy link

mergify bot commented Sep 10, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ProExpertProg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 10, 2025
@ProExpertProg ProExpertProg force-pushed the luka/custom-op-matching-2 branch from b374514 to 4a44829 Compare September 11, 2025 04:42
@mergify mergify bot removed the needs-rebase label Sep 11, 2025
@mergify
Copy link

mergify bot commented Sep 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ProExpertProg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 12, 2025
@ProExpertProg ProExpertProg force-pushed the luka/custom-op-matching-2 branch 2 times, most recently from 42f2231 to a8c9181 Compare September 12, 2025 19:20
@mergify mergify bot removed the needs-rebase label Sep 12, 2025
@mgoin mgoin self-assigned this Sep 15, 2025
@mergify
Copy link

mergify bot commented Sep 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ProExpertProg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 16, 2025
@ProExpertProg ProExpertProg force-pushed the luka/custom-op-matching-2 branch 6 times, most recently from 1e9326c to e3d0c83 Compare September 20, 2025 02:04
@mergify mergify bot removed the needs-rebase label Sep 20, 2025
@ProExpertProg ProExpertProg force-pushed the luka/custom-op-matching-2 branch from e3d0c83 to 9151d01 Compare September 20, 2025 13:49
simon-mo pushed a commit that referenced this pull request Sep 22, 2025
…g utils, fix DCE bug (#23091), fix test (#24376), and prep for custom op matching (#24604) (#24542)

Signed-off-by: Luka Govedič <[email protected]>
Signed-off-by: luka <[email protected]>
Signed-off-by: Luka Govedič <[email protected]>
@mergify
Copy link

mergify bot commented Sep 22, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ProExpertProg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 22, 2025
@ProExpertProg ProExpertProg force-pushed the luka/custom-op-matching-2 branch from 9151d01 to da3cb54 Compare September 23, 2025 14:24
@mergify mergify bot removed the needs-rebase label Sep 23, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…g utils, fix DCE bug (vllm-project#23091), fix test (vllm-project#24376), and prep for custom op matching (vllm-project#24604) (vllm-project#24542)

Signed-off-by: Luka Govedič <[email protected]>
Signed-off-by: luka <[email protected]>
Signed-off-by: Luka Govedič <[email protected]>
ProExpertProg added a commit to neuralmagic/vllm that referenced this pull request Oct 16, 2025
Relax tolerance for L40 fusion test

Signed-off-by: Luka Govedič <[email protected]>

Fix NamedTuple

Signed-off-by: Luka Govedič <[email protected]>

Update test durations

Signed-off-by: Luka Govedič <[email protected]>

commit c03b29b
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 20:31:11 2025 -0400

    Remove inductor graph partition from unit test (included in e2e tests)

    Signed-off-by: Luka Govedič <[email protected]>

commit ae581e1
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 20:30:02 2025 -0400

    Fix attention fusion test numerics

    Signed-off-by: Luka Govedič <[email protected]>

commit a226864
Merge: e99a759 0a9ef0c
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 20:03:52 2025 -0400

    Merge branch 'main' into luka/custom-op-matching-2

    Signed-off-by: Luka Govedič <[email protected]>

commit e99a759
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 19:20:47 2025 -0400

    Break up B200 tests, move allreduce to H200

    Signed-off-by: Luka Govedič <[email protected]>

commit 876ef22
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 18:43:48 2025 -0400

    Fix tests, PR feedback

    Signed-off-by: Luka Govedič <[email protected]>

commit 6253d5b
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 13:18:03 2025 -0400

    Add e2e to L40 distributed, move tests to start of B200 distributed

    Signed-off-by: Luka Govedič <[email protected]>

commit de7405b
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 13:08:57 2025 -0400

    PR comments: add _custom_op suffix

    Signed-off-by: Luka Govedič <[email protected]>

commit 24f1298
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 13:08:13 2025 -0400

    PR comments: cleanup fusion passes, & matching

    Signed-off-by: Luka Govedič <[email protected]>

commit 7e6f5b3
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 13:06:19 2025 -0400

    add flat_product example

    Signed-off-by: Luka Govedič <[email protected]>

commit 532cbcf
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 12:56:07 2025 -0400

    Add comment to test_logger

    Signed-off-by: Luka Govedič <[email protected]>

commit 3943257
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 12:11:29 2025 -0400

    Restore original torch.Parameter behavior in RMSNorm

    Signed-off-by: Luka Govedič <[email protected]>

commit a3ebf0a
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 12:09:48 2025 -0400

    fix fp8 quant tests

    Signed-off-by: Luka Govedič <[email protected]>

commit db2b1c7
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 11:59:35 2025 -0400

    Smaller model for e2e fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit bcd95b5
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 11:54:47 2025 -0400

    Fix func test

    Signed-off-by: Luka Govedič <[email protected]>

commit bb0254a
Merge: 465ce58 136a17f
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 10:01:15 2025 -0400

    Merge branch 'main' into luka/custom-op-matching-2

    # Conflicts:
    #	tests/utils_/test_utils.py

    Signed-off-by: Luka Govedič <[email protected]>

commit 465ce58
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 09:59:54 2025 -0400

    Update tests/compile/test_fusion.py

    Signed-off-by: Luka Govedič <[email protected]>

commit 2a6299c
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 04:12:01 2025 -0400

    Fix e2e test patterns

    Signed-off-by: Luka Govedič <[email protected]>

commit 8ffb474
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 03:25:26 2025 -0400

    Remove/fix TODOs

    Signed-off-by: Luka Govedič <[email protected]>

commit db16ee1
Merge: 12a7c6d f0862ea
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 03:10:21 2025 -0400

    Merge branch 'main' into luka/custom-op-matching-2

commit 12a7c6d
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 03:00:52 2025 -0400

    Tests & docs for flat_product

    Signed-off-by: Luka Govedič <[email protected]>

commit 8a363d3
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 02:43:03 2025 -0400

    Slight improvement for E2E fusion

    Signed-off-by: Luka Govedič <[email protected]>

commit f6429e4
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 02:40:43 2025 -0400

    Cleanup test_fusion_attn.py

    Signed-off-by: Luka Govedič <[email protected]>

commit b5f89e5
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 02:29:06 2025 -0400

    Cleanup test_full_graph.py

    Signed-off-by: Luka Govedič <[email protected]>

commit 97b3ff2
Merge: af1ffa7 8c851f6
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 02:08:00 2025 -0400

    Merge remote-tracking branch 'upstream/main' into luka/custom-op-matching-2

    Signed-off-by: Luka Govedič <[email protected]>

commit af1ffa7
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 01:54:18 2025 -0400

    PR review

    Signed-off-by: Luka Govedič <[email protected]>

commit 3547b87
Author: Luka Govedič <[email protected]>
Date:   Sun Oct 12 11:11:14 2025 -0400

    fix sequence parallelism test

    Signed-off-by: Luka Govedič <[email protected]>

commit 26892df
Author: Luka Govedič <[email protected]>
Date:   Sun Oct 12 11:03:35 2025 -0400

    fix pass manager test

    Signed-off-by: Luka Govedič <[email protected]>

commit 0d6e550
Author: Luka Govedič <[email protected]>
Date:   Sun Oct 12 10:57:07 2025 -0400

    fix func test

    Signed-off-by: Luka Govedič <[email protected]>

commit 1b1a63e
Author: Luka Govedič <[email protected]>
Date:   Sat Oct 11 14:33:46 2025 -0400

    Fix e2e allreduce fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit 52f78ce
Author: Luka Govedič <[email protected]>
Date:   Sat Oct 11 08:38:42 2025 -0400

    Add allreduce test to 2-gpu test

    Signed-off-by: Luka Govedič <[email protected]>

commit 095277c
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 19:03:18 2025 -0400

    Simplify matcher utils by using RMSNorm.forward_static

    Signed-off-by: Luka Govedič <[email protected]>

commit c3264d8
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 18:36:15 2025 -0400

    Fix partial match rmsnorm+quant, fix allreduce+rmsnorm match

    Signed-off-by: Luka Govedič <[email protected]>

commit a1c7fdb
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 16:13:42 2025 -0400

    add more comprehensive testing for allreduce-rmsnorm, fix fp4 (-rmsnorm still failing)

    Signed-off-by: Luka Govedič <[email protected]>

commit 46ee626
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 13:51:13 2025 -0400

    add more comprehensive testing for quantfp8 (-rmsnorm+-quant still failing)

    Signed-off-by: Luka Govedič <[email protected]>

commit 32989d8
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 13:49:09 2025 -0400

    add pattern for final allreduce in model

    Signed-off-by: Luka Govedič <[email protected]>

commit 5619bc3
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 9 21:42:34 2025 -0400

    clean up e2e tests

    Signed-off-by: Luka Govedič <[email protected]>

commit 1756f67
Author: Luka Govedič <[email protected]>
Date:   Sat Oct 4 00:06:13 2025 -0400

    add back fp4

    Signed-off-by: Luka Govedič <[email protected]>

commit c653d24
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 3 23:39:23 2025 -0400

    Fix spelling, precommit

    Signed-off-by: Luka Govedič <[email protected]>

commit 31d0127
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 3 13:01:13 2025 -0400

    Add e2e fusions to fullgraph test (should work with Triton backend), disable without flashinfer

    Signed-off-by: Luka Govedič <[email protected]>

commit 4dbfcf7
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 3 11:49:24 2025 -0400

    Move e2e tests to new file, add to test pipeline

    Signed-off-by: Luka Govedič <[email protected]>

commit d3f95fe
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 3 11:38:39 2025 -0400

    fullgraph allreduce test update requirements

    Signed-off-by: Luka Govedič <[email protected]>

commit c8675ff
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 22:18:24 2025 -0400

    log depyf folder, fix context for TestBackend, fix pattern dump

    Signed-off-by: Luka Govedič <[email protected]>

commit d09a278
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 22:16:24 2025 -0400

    allreduce fusion working with/without custom ops (with fp4)

    Signed-off-by: Luka Govedič <[email protected]>

commit b7f52bf
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 22:12:04 2025 -0400

    allreduce fusion working with/without custom ops (except fp4)

    Signed-off-by: Luka Govedič <[email protected]>

commit 54189a9
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 21:24:51 2025 -0400

    allreduce fusion working (custom ops on)

    Signed-off-by: Luka Govedič <[email protected]>

commit db479ae
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 16:51:30 2025 -0700

    TEMP allreduce fusion

    Signed-off-by: Luka Govedič <[email protected]>

commit 5fef180
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 16:35:31 2025 -0700

    clean up fullgraph tests

    Signed-off-by: Luka Govedič <[email protected]>

commit 7eb1364
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 19:26:48 2025 -0400

    Update csrc/layernorm_kernels.cu

    Signed-off-by: Luka Govedič <[email protected]>

commit 66a35a9
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 19:26:42 2025 -0400

    Update tests/compile/backend.py

    Signed-off-by: Luka Govedič <[email protected]>

commit 21a9f9f
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 1 19:02:24 2025 -0700

    Fixed tests, passing with 2.8, 2.9 tbd

    Signed-off-by: Luka Govedič <[email protected]>

commit a2aa978
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 1 11:21:02 2025 -0700

    Test for caplog utils

    Signed-off-by: Luka Govedič <[email protected]>

commit eb899a4
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 30 12:55:33 2025 -0700

    Temp MP workaround P3

    Signed-off-by: Luka Govedič <[email protected]>

commit ae7f56f
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 30 12:50:28 2025 -0700

    Temp MP workaround P2

    Signed-off-by: Luka Govedič <[email protected]>

commit 47b4688
Author: Luka Govedič <[email protected]>
Date:   Sat Sep 27 07:38:52 2025 -0700

    TEMP working on caplog

    Signed-off-by: Luka Govedič <[email protected]>

commit d0b1b56
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 26 15:39:08 2025 -0700

    improve tests by adding more cases

    Signed-off-by: Luka Govedič <[email protected]>

commit 490ac86
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 26 13:24:01 2025 -0700

    Add TP=2 test (untested)

    Signed-off-by: Luka Govedič <[email protected]>

commit c6d6c3b
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 26 13:20:52 2025 -0700

    Refactor E2E attn fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit 141a37e
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 26 07:41:41 2025 -0700

    Fix rmsnorm

    Signed-off-by: Luka Govedič <[email protected]>

commit cdd1529
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 17:18:43 2025 -0700

    Flat product for better test names/visibility

    Signed-off-by: Luka Govedič <[email protected]>

commit d843a67
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 17:02:14 2025 -0700

    Add triton attn test to attn+quant fusion

    Signed-off-by: Luka Govedič <[email protected]>

commit 1277999
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 16:12:23 2025 -0700

    Remove V0 attn fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit 77835fd
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 16:12:11 2025 -0700

    Attention fusion works with custom ops

    Signed-off-by: Luka Govedič <[email protected]>

commit 1ae80c6
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 16:02:21 2025 -0700

    Move global vllm_config to pass manager

    Signed-off-by: Luka Govedič <[email protected]>

commit b172747
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 15:02:33 2025 -0700

    Functionalize attn+quant patterns

    Signed-off-by: Luka Govedič <[email protected]>

commit d96913a
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 16:06:25 2025 -0400

    Cleanup test_fusion.py, added extra layer of rms/quant

    Signed-off-by: Luka Govedič <[email protected]>

commit e6b394e
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 19 19:00:27 2025 -0700

    Add TODO

    Signed-off-by: Luka Govedič <[email protected]>

commit 05a65f3
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 18 13:21:46 2025 -0700

    ALL WORKS

    Signed-off-by: Luka Govedič <[email protected]>

commit 14fdc8b
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 18 12:32:27 2025 -0700

    quant with fix for pure torch, broke others

    Signed-off-by: Luka Govedič <[email protected]>

commit e151e6d
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 16 11:08:39 2025 -0700

    quant works except (torch,torch)

    Signed-off-by: Luka Govedič <[email protected]>

commit 8e4a56f
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 16 10:47:13 2025 -0700

    rms works fully now, had to remove more conversions (and add them in replacements). TODO pass to remove unnecessary conversions?

    Signed-off-by: Luka Govedič <[email protected]>

commit cdad3c0
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 12 12:11:48 2025 -0700

    TEMP: fixed rmsnorm issue (TODO assert dtypes in fused norm_quant kernels)

    Signed-off-by: Luka Govedič <[email protected]>

commit f3b4cf1
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 9 09:48:53 2025 -0700

    TEMP Mostly working

    Signed-off-by: Luka Govedič <[email protected]>

commit 21d7d67
Author: Luka Govedič <[email protected]>
Date:   Sat Sep 6 14:35:13 2025 -0700

    Functionalized patterns in prep for utility

    Signed-off-by: Luka Govedič <[email protected]>

Signed-off-by: ProExpertProg <[email protected]>
ProExpertProg added a commit to neuralmagic/vllm that referenced this pull request Oct 16, 2025
Relax tolerance for L40 fusion test

Signed-off-by: Luka Govedič <[email protected]>

Fix NamedTuple

Signed-off-by: Luka Govedič <[email protected]>

Update test durations

Signed-off-by: Luka Govedič <[email protected]>

commit c03b29b
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 20:31:11 2025 -0400

    Remove inductor graph partition from unit test (included in e2e tests)

    Signed-off-by: Luka Govedič <[email protected]>

commit ae581e1
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 20:30:02 2025 -0400

    Fix attention fusion test numerics

    Signed-off-by: Luka Govedič <[email protected]>

commit a226864
Merge: e99a759 0a9ef0c
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 20:03:52 2025 -0400

    Merge branch 'main' into luka/custom-op-matching-2

    Signed-off-by: Luka Govedič <[email protected]>

commit e99a759
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 19:20:47 2025 -0400

    Break up B200 tests, move allreduce to H200

    Signed-off-by: Luka Govedič <[email protected]>

commit 876ef22
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 18:43:48 2025 -0400

    Fix tests, PR feedback

    Signed-off-by: Luka Govedič <[email protected]>

commit 6253d5b
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 13:18:03 2025 -0400

    Add e2e to L40 distributed, move tests to start of B200 distributed

    Signed-off-by: Luka Govedič <[email protected]>

commit de7405b
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 13:08:57 2025 -0400

    PR comments: add _custom_op suffix

    Signed-off-by: Luka Govedič <[email protected]>

commit 24f1298
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 13:08:13 2025 -0400

    PR comments: cleanup fusion passes, & matching

    Signed-off-by: Luka Govedič <[email protected]>

commit 7e6f5b3
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 13:06:19 2025 -0400

    add flat_product example

    Signed-off-by: Luka Govedič <[email protected]>

commit 532cbcf
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 12:56:07 2025 -0400

    Add comment to test_logger

    Signed-off-by: Luka Govedič <[email protected]>

commit 3943257
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 12:11:29 2025 -0400

    Restore original torch.Parameter behavior in RMSNorm

    Signed-off-by: Luka Govedič <[email protected]>

commit a3ebf0a
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 12:09:48 2025 -0400

    fix fp8 quant tests

    Signed-off-by: Luka Govedič <[email protected]>

commit db2b1c7
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 11:59:35 2025 -0400

    Smaller model for e2e fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit bcd95b5
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 11:54:47 2025 -0400

    Fix func test

    Signed-off-by: Luka Govedič <[email protected]>

commit bb0254a
Merge: 465ce58 136a17f
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 10:01:15 2025 -0400

    Merge branch 'main' into luka/custom-op-matching-2

    # Conflicts:
    #	tests/utils_/test_utils.py

    Signed-off-by: Luka Govedič <[email protected]>

commit 465ce58
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 09:59:54 2025 -0400

    Update tests/compile/test_fusion.py

    Signed-off-by: Luka Govedič <[email protected]>

commit 2a6299c
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 04:12:01 2025 -0400

    Fix e2e test patterns

    Signed-off-by: Luka Govedič <[email protected]>

commit 8ffb474
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 03:25:26 2025 -0400

    Remove/fix TODOs

    Signed-off-by: Luka Govedič <[email protected]>

commit db16ee1
Merge: 12a7c6d f0862ea
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 03:10:21 2025 -0400

    Merge branch 'main' into luka/custom-op-matching-2

commit 12a7c6d
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 03:00:52 2025 -0400

    Tests & docs for flat_product

    Signed-off-by: Luka Govedič <[email protected]>

commit 8a363d3
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 02:43:03 2025 -0400

    Slight improvement for E2E fusion

    Signed-off-by: Luka Govedič <[email protected]>

commit f6429e4
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 02:40:43 2025 -0400

    Cleanup test_fusion_attn.py

    Signed-off-by: Luka Govedič <[email protected]>

commit b5f89e5
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 02:29:06 2025 -0400

    Cleanup test_full_graph.py

    Signed-off-by: Luka Govedič <[email protected]>

commit 97b3ff2
Merge: af1ffa7 8c851f6
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 02:08:00 2025 -0400

    Merge remote-tracking branch 'upstream/main' into luka/custom-op-matching-2

    Signed-off-by: Luka Govedič <[email protected]>

commit af1ffa7
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 01:54:18 2025 -0400

    PR review

    Signed-off-by: Luka Govedič <[email protected]>

commit 3547b87
Author: Luka Govedič <[email protected]>
Date:   Sun Oct 12 11:11:14 2025 -0400

    fix sequence parallelism test

    Signed-off-by: Luka Govedič <[email protected]>

commit 26892df
Author: Luka Govedič <[email protected]>
Date:   Sun Oct 12 11:03:35 2025 -0400

    fix pass manager test

    Signed-off-by: Luka Govedič <[email protected]>

commit 0d6e550
Author: Luka Govedič <[email protected]>
Date:   Sun Oct 12 10:57:07 2025 -0400

    fix func test

    Signed-off-by: Luka Govedič <[email protected]>

commit 1b1a63e
Author: Luka Govedič <[email protected]>
Date:   Sat Oct 11 14:33:46 2025 -0400

    Fix e2e allreduce fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit 52f78ce
Author: Luka Govedič <[email protected]>
Date:   Sat Oct 11 08:38:42 2025 -0400

    Add allreduce test to 2-gpu test

    Signed-off-by: Luka Govedič <[email protected]>

commit 095277c
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 19:03:18 2025 -0400

    Simplify matcher utils by using RMSNorm.forward_static

    Signed-off-by: Luka Govedič <[email protected]>

commit c3264d8
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 18:36:15 2025 -0400

    Fix partial match rmsnorm+quant, fix allreduce+rmsnorm match

    Signed-off-by: Luka Govedič <[email protected]>

commit a1c7fdb
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 16:13:42 2025 -0400

    add more comprehensive testing for allreduce-rmsnorm, fix fp4 (-rmsnorm still failing)

    Signed-off-by: Luka Govedič <[email protected]>

commit 46ee626
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 13:51:13 2025 -0400

    add more comprehensive testing for quantfp8 (-rmsnorm+-quant still failing)

    Signed-off-by: Luka Govedič <[email protected]>

commit 32989d8
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 13:49:09 2025 -0400

    add pattern for final allreduce in model

    Signed-off-by: Luka Govedič <[email protected]>

commit 5619bc3
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 9 21:42:34 2025 -0400

    clean up e2e tests

    Signed-off-by: Luka Govedič <[email protected]>

commit 1756f67
Author: Luka Govedič <[email protected]>
Date:   Sat Oct 4 00:06:13 2025 -0400

    add back fp4

    Signed-off-by: Luka Govedič <[email protected]>

commit c653d24
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 3 23:39:23 2025 -0400

    Fix spelling, precommit

    Signed-off-by: Luka Govedič <[email protected]>

commit 31d0127
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 3 13:01:13 2025 -0400

    Add e2e fusions to fullgraph test (should work with Triton backend), disable without flashinfer

    Signed-off-by: Luka Govedič <[email protected]>

commit 4dbfcf7
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 3 11:49:24 2025 -0400

    Move e2e tests to new file, add to test pipeline

    Signed-off-by: Luka Govedič <[email protected]>

commit d3f95fe
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 3 11:38:39 2025 -0400

    fullgraph allreduce test update requirements

    Signed-off-by: Luka Govedič <[email protected]>

commit c8675ff
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 22:18:24 2025 -0400

    log depyf folder, fix context for TestBackend, fix pattern dump

    Signed-off-by: Luka Govedič <[email protected]>

commit d09a278
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 22:16:24 2025 -0400

    allreduce fusion working with/without custom ops (with fp4)

    Signed-off-by: Luka Govedič <[email protected]>

commit b7f52bf
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 22:12:04 2025 -0400

    allreduce fusion working with/without custom ops (except fp4)

    Signed-off-by: Luka Govedič <[email protected]>

commit 54189a9
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 21:24:51 2025 -0400

    allreduce fusion working (custom ops on)

    Signed-off-by: Luka Govedič <[email protected]>

commit db479ae
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 16:51:30 2025 -0700

    TEMP allreduce fusion

    Signed-off-by: Luka Govedič <[email protected]>

commit 5fef180
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 16:35:31 2025 -0700

    clean up fullgraph tests

    Signed-off-by: Luka Govedič <[email protected]>

commit 7eb1364
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 19:26:48 2025 -0400

    Update csrc/layernorm_kernels.cu

    Signed-off-by: Luka Govedič <[email protected]>

commit 66a35a9
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 19:26:42 2025 -0400

    Update tests/compile/backend.py

    Signed-off-by: Luka Govedič <[email protected]>

commit 21a9f9f
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 1 19:02:24 2025 -0700

    Fixed tests, passing with 2.8, 2.9 tbd

    Signed-off-by: Luka Govedič <[email protected]>

commit a2aa978
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 1 11:21:02 2025 -0700

    Test for caplog utils

    Signed-off-by: Luka Govedič <[email protected]>

commit eb899a4
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 30 12:55:33 2025 -0700

    Temp MP workaround P3

    Signed-off-by: Luka Govedič <[email protected]>

commit ae7f56f
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 30 12:50:28 2025 -0700

    Temp MP workaround P2

    Signed-off-by: Luka Govedič <[email protected]>

commit 47b4688
Author: Luka Govedič <[email protected]>
Date:   Sat Sep 27 07:38:52 2025 -0700

    TEMP working on caplog

    Signed-off-by: Luka Govedič <[email protected]>

commit d0b1b56
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 26 15:39:08 2025 -0700

    improve tests by adding more cases

    Signed-off-by: Luka Govedič <[email protected]>

commit 490ac86
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 26 13:24:01 2025 -0700

    Add TP=2 test (untested)

    Signed-off-by: Luka Govedič <[email protected]>

commit c6d6c3b
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 26 13:20:52 2025 -0700

    Refactor E2E attn fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit 141a37e
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 26 07:41:41 2025 -0700

    Fix rmsnorm

    Signed-off-by: Luka Govedič <[email protected]>

commit cdd1529
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 17:18:43 2025 -0700

    Flat product for better test names/visibility

    Signed-off-by: Luka Govedič <[email protected]>

commit d843a67
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 17:02:14 2025 -0700

    Add triton attn test to attn+quant fusion

    Signed-off-by: Luka Govedič <[email protected]>

commit 1277999
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 16:12:23 2025 -0700

    Remove V0 attn fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit 77835fd
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 16:12:11 2025 -0700

    Attention fusion works with custom ops

    Signed-off-by: Luka Govedič <[email protected]>

commit 1ae80c6
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 16:02:21 2025 -0700

    Move global vllm_config to pass manager

    Signed-off-by: Luka Govedič <[email protected]>

commit b172747
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 15:02:33 2025 -0700

    Functionalize attn+quant patterns

    Signed-off-by: Luka Govedič <[email protected]>

commit d96913a
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 16:06:25 2025 -0400

    Cleanup test_fusion.py, added extra layer of rms/quant

    Signed-off-by: Luka Govedič <[email protected]>

commit e6b394e
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 19 19:00:27 2025 -0700

    Add TODO

    Signed-off-by: Luka Govedič <[email protected]>

commit 05a65f3
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 18 13:21:46 2025 -0700

    ALL WORKS

    Signed-off-by: Luka Govedič <[email protected]>

commit 14fdc8b
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 18 12:32:27 2025 -0700

    quant with fix for pure torch, broke others

    Signed-off-by: Luka Govedič <[email protected]>

commit e151e6d
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 16 11:08:39 2025 -0700

    quant works except (torch,torch)

    Signed-off-by: Luka Govedič <[email protected]>

commit 8e4a56f
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 16 10:47:13 2025 -0700

    rms works fully now, had to remove more conversions (and add them in replacements). TODO pass to remove unnecessary conversions?

    Signed-off-by: Luka Govedič <[email protected]>

commit cdad3c0
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 12 12:11:48 2025 -0700

    TEMP: fixed rmsnorm issue (TODO assert dtypes in fused norm_quant kernels)

    Signed-off-by: Luka Govedič <[email protected]>

commit f3b4cf1
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 9 09:48:53 2025 -0700

    TEMP Mostly working

    Signed-off-by: Luka Govedič <[email protected]>

commit 21d7d67
Author: Luka Govedič <[email protected]>
Date:   Sat Sep 6 14:35:13 2025 -0700

    Functionalized patterns in prep for utility

    Signed-off-by: Luka Govedič <[email protected]>

Signed-off-by: ProExpertProg <[email protected]>
Signed-off-by: Luka Govedič <[email protected]>
ProExpertProg added a commit to neuralmagic/vllm that referenced this pull request Oct 16, 2025
commit e34d36d
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 16 09:33:16 2025 -0400

    More tweaking of precision

    Signed-off-by: Luka Govedič <[email protected]>

commit 6319e39
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 16 00:58:59 2025 -0400

    Update test durations

    Signed-off-by: Luka Govedič <[email protected]>

commit d4fe977
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 16 00:54:25 2025 -0400

    Fix NamedTuple

    Signed-off-by: Luka Govedič <[email protected]>

commit 65ef5fd
Merge: d2e0489 785d8b6
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 16 00:38:26 2025 -0400

    Merge branch 'main' into luka/custom-op-matching-2

commit d2e0489
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 16 00:31:15 2025 -0400

    Relax tolerance for L40 fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit c03b29b
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 20:31:11 2025 -0400

    Remove inductor graph partition from unit test (included in e2e tests)

    Signed-off-by: Luka Govedič <[email protected]>

commit ae581e1
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 20:30:02 2025 -0400

    Fix attention fusion test numerics

    Signed-off-by: Luka Govedič <[email protected]>

commit a226864
Merge: e99a759 0a9ef0c
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 20:03:52 2025 -0400

    Merge branch 'main' into luka/custom-op-matching-2

    Signed-off-by: Luka Govedič <[email protected]>

commit e99a759
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 19:20:47 2025 -0400

    Break up B200 tests, move allreduce to H200

    Signed-off-by: Luka Govedič <[email protected]>

commit 876ef22
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 18:43:48 2025 -0400

    Fix tests, PR feedback

    Signed-off-by: Luka Govedič <[email protected]>

commit 6253d5b
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 13:18:03 2025 -0400

    Add e2e to L40 distributed, move tests to start of B200 distributed

    Signed-off-by: Luka Govedič <[email protected]>

commit de7405b
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 13:08:57 2025 -0400

    PR comments: add _custom_op suffix

    Signed-off-by: Luka Govedič <[email protected]>

commit 24f1298
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 13:08:13 2025 -0400

    PR comments: cleanup fusion passes, & matching

    Signed-off-by: Luka Govedič <[email protected]>

commit 7e6f5b3
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 13:06:19 2025 -0400

    add flat_product example

    Signed-off-by: Luka Govedič <[email protected]>

commit 532cbcf
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 12:56:07 2025 -0400

    Add comment to test_logger

    Signed-off-by: Luka Govedič <[email protected]>

commit 3943257
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 12:11:29 2025 -0400

    Restore original torch.Parameter behavior in RMSNorm

    Signed-off-by: Luka Govedič <[email protected]>

commit a3ebf0a
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 12:09:48 2025 -0400

    fix fp8 quant tests

    Signed-off-by: Luka Govedič <[email protected]>

commit db2b1c7
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 11:59:35 2025 -0400

    Smaller model for e2e fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit bcd95b5
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 11:54:47 2025 -0400

    Fix func test

    Signed-off-by: Luka Govedič <[email protected]>

commit bb0254a
Merge: 465ce58 136a17f
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 10:01:15 2025 -0400

    Merge branch 'main' into luka/custom-op-matching-2

    # Conflicts:
    #	tests/utils_/test_utils.py

    Signed-off-by: Luka Govedič <[email protected]>

commit 465ce58
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 09:59:54 2025 -0400

    Update tests/compile/test_fusion.py

    Signed-off-by: Luka Govedič <[email protected]>

commit 2a6299c
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 04:12:01 2025 -0400

    Fix e2e test patterns

    Signed-off-by: Luka Govedič <[email protected]>

commit 8ffb474
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 03:25:26 2025 -0400

    Remove/fix TODOs

    Signed-off-by: Luka Govedič <[email protected]>

commit db16ee1
Merge: 12a7c6d f0862ea
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 03:10:21 2025 -0400

    Merge branch 'main' into luka/custom-op-matching-2

commit 12a7c6d
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 03:00:52 2025 -0400

    Tests & docs for flat_product

    Signed-off-by: Luka Govedič <[email protected]>

commit 8a363d3
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 02:43:03 2025 -0400

    Slight improvement for E2E fusion

    Signed-off-by: Luka Govedič <[email protected]>

commit f6429e4
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 02:40:43 2025 -0400

    Cleanup test_fusion_attn.py

    Signed-off-by: Luka Govedič <[email protected]>

commit b5f89e5
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 02:29:06 2025 -0400

    Cleanup test_full_graph.py

    Signed-off-by: Luka Govedič <[email protected]>

commit 97b3ff2
Merge: af1ffa7 8c851f6
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 02:08:00 2025 -0400

    Merge remote-tracking branch 'upstream/main' into luka/custom-op-matching-2

    Signed-off-by: Luka Govedič <[email protected]>

commit af1ffa7
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 15 01:54:18 2025 -0400

    PR review

    Signed-off-by: Luka Govedič <[email protected]>

commit 3547b87
Author: Luka Govedič <[email protected]>
Date:   Sun Oct 12 11:11:14 2025 -0400

    fix sequence parallelism test

    Signed-off-by: Luka Govedič <[email protected]>

commit 26892df
Author: Luka Govedič <[email protected]>
Date:   Sun Oct 12 11:03:35 2025 -0400

    fix pass manager test

    Signed-off-by: Luka Govedič <[email protected]>

commit 0d6e550
Author: Luka Govedič <[email protected]>
Date:   Sun Oct 12 10:57:07 2025 -0400

    fix func test

    Signed-off-by: Luka Govedič <[email protected]>

commit 1b1a63e
Author: Luka Govedič <[email protected]>
Date:   Sat Oct 11 14:33:46 2025 -0400

    Fix e2e allreduce fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit 52f78ce
Author: Luka Govedič <[email protected]>
Date:   Sat Oct 11 08:38:42 2025 -0400

    Add allreduce test to 2-gpu test

    Signed-off-by: Luka Govedič <[email protected]>

commit 095277c
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 19:03:18 2025 -0400

    Simplify matcher utils by using RMSNorm.forward_static

    Signed-off-by: Luka Govedič <[email protected]>

commit c3264d8
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 18:36:15 2025 -0400

    Fix partial match rmsnorm+quant, fix allreduce+rmsnorm match

    Signed-off-by: Luka Govedič <[email protected]>

commit a1c7fdb
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 16:13:42 2025 -0400

    add more comprehensive testing for allreduce-rmsnorm, fix fp4 (-rmsnorm still failing)

    Signed-off-by: Luka Govedič <[email protected]>

commit 46ee626
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 13:51:13 2025 -0400

    add more comprehensive testing for quantfp8 (-rmsnorm+-quant still failing)

    Signed-off-by: Luka Govedič <[email protected]>

commit 32989d8
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 10 13:49:09 2025 -0400

    add pattern for final allreduce in model

    Signed-off-by: Luka Govedič <[email protected]>

commit 5619bc3
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 9 21:42:34 2025 -0400

    clean up e2e tests

    Signed-off-by: Luka Govedič <[email protected]>

commit 1756f67
Author: Luka Govedič <[email protected]>
Date:   Sat Oct 4 00:06:13 2025 -0400

    add back fp4

    Signed-off-by: Luka Govedič <[email protected]>

commit c653d24
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 3 23:39:23 2025 -0400

    Fix spelling, precommit

    Signed-off-by: Luka Govedič <[email protected]>

commit 31d0127
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 3 13:01:13 2025 -0400

    Add e2e fusions to fullgraph test (should work with Triton backend), disable without flashinfer

    Signed-off-by: Luka Govedič <[email protected]>

commit 4dbfcf7
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 3 11:49:24 2025 -0400

    Move e2e tests to new file, add to test pipeline

    Signed-off-by: Luka Govedič <[email protected]>

commit d3f95fe
Author: Luka Govedič <[email protected]>
Date:   Fri Oct 3 11:38:39 2025 -0400

    fullgraph allreduce test update requirements

    Signed-off-by: Luka Govedič <[email protected]>

commit c8675ff
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 22:18:24 2025 -0400

    log depyf folder, fix context for TestBackend, fix pattern dump

    Signed-off-by: Luka Govedič <[email protected]>

commit d09a278
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 22:16:24 2025 -0400

    allreduce fusion working with/without custom ops (with fp4)

    Signed-off-by: Luka Govedič <[email protected]>

commit b7f52bf
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 22:12:04 2025 -0400

    allreduce fusion working with/without custom ops (except fp4)

    Signed-off-by: Luka Govedič <[email protected]>

commit 54189a9
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 21:24:51 2025 -0400

    allreduce fusion working (custom ops on)

    Signed-off-by: Luka Govedič <[email protected]>

commit db479ae
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 16:51:30 2025 -0700

    TEMP allreduce fusion

    Signed-off-by: Luka Govedič <[email protected]>

commit 5fef180
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 16:35:31 2025 -0700

    clean up fullgraph tests

    Signed-off-by: Luka Govedič <[email protected]>

commit 7eb1364
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 19:26:48 2025 -0400

    Update csrc/layernorm_kernels.cu

    Signed-off-by: Luka Govedič <[email protected]>

commit 66a35a9
Author: Luka Govedič <[email protected]>
Date:   Thu Oct 2 19:26:42 2025 -0400

    Update tests/compile/backend.py

    Signed-off-by: Luka Govedič <[email protected]>

commit 21a9f9f
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 1 19:02:24 2025 -0700

    Fixed tests, passing with 2.8, 2.9 tbd

    Signed-off-by: Luka Govedič <[email protected]>

commit a2aa978
Author: Luka Govedič <[email protected]>
Date:   Wed Oct 1 11:21:02 2025 -0700

    Test for caplog utils

    Signed-off-by: Luka Govedič <[email protected]>

commit eb899a4
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 30 12:55:33 2025 -0700

    Temp MP workaround P3

    Signed-off-by: Luka Govedič <[email protected]>

commit ae7f56f
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 30 12:50:28 2025 -0700

    Temp MP workaround P2

    Signed-off-by: Luka Govedič <[email protected]>

commit 47b4688
Author: Luka Govedič <[email protected]>
Date:   Sat Sep 27 07:38:52 2025 -0700

    TEMP working on caplog

    Signed-off-by: Luka Govedič <[email protected]>

commit d0b1b56
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 26 15:39:08 2025 -0700

    improve tests by adding more cases

    Signed-off-by: Luka Govedič <[email protected]>

commit 490ac86
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 26 13:24:01 2025 -0700

    Add TP=2 test (untested)

    Signed-off-by: Luka Govedič <[email protected]>

commit c6d6c3b
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 26 13:20:52 2025 -0700

    Refactor E2E attn fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit 141a37e
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 26 07:41:41 2025 -0700

    Fix rmsnorm

    Signed-off-by: Luka Govedič <[email protected]>

commit cdd1529
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 17:18:43 2025 -0700

    Flat product for better test names/visibility

    Signed-off-by: Luka Govedič <[email protected]>

commit d843a67
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 17:02:14 2025 -0700

    Add triton attn test to attn+quant fusion

    Signed-off-by: Luka Govedič <[email protected]>

commit 1277999
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 16:12:23 2025 -0700

    Remove V0 attn fusion test

    Signed-off-by: Luka Govedič <[email protected]>

commit 77835fd
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 16:12:11 2025 -0700

    Attention fusion works with custom ops

    Signed-off-by: Luka Govedič <[email protected]>

commit 1ae80c6
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 16:02:21 2025 -0700

    Move global vllm_config to pass manager

    Signed-off-by: Luka Govedič <[email protected]>

commit b172747
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 15:02:33 2025 -0700

    Functionalize attn+quant patterns

    Signed-off-by: Luka Govedič <[email protected]>

commit d96913a
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 25 16:06:25 2025 -0400

    Cleanup test_fusion.py, added extra layer of rms/quant

    Signed-off-by: Luka Govedič <[email protected]>

commit e6b394e
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 19 19:00:27 2025 -0700

    Add TODO

    Signed-off-by: Luka Govedič <[email protected]>

commit 05a65f3
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 18 13:21:46 2025 -0700

    ALL WORKS

    Signed-off-by: Luka Govedič <[email protected]>

commit 14fdc8b
Author: Luka Govedič <[email protected]>
Date:   Thu Sep 18 12:32:27 2025 -0700

    quant with fix for pure torch, broke others

    Signed-off-by: Luka Govedič <[email protected]>

commit e151e6d
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 16 11:08:39 2025 -0700

    quant works except (torch,torch)

    Signed-off-by: Luka Govedič <[email protected]>

commit 8e4a56f
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 16 10:47:13 2025 -0700

    rms works fully now, had to remove more conversions (and add them in replacements). TODO pass to remove unnecessary conversions?

    Signed-off-by: Luka Govedič <[email protected]>

commit cdad3c0
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 12 12:11:48 2025 -0700

    TEMP: fixed rmsnorm issue (TODO assert dtypes in fused norm_quant kernels)

    Signed-off-by: Luka Govedič <[email protected]>

commit f3b4cf1
Author: Luka Govedič <[email protected]>
Date:   Tue Sep 9 09:48:53 2025 -0700

    TEMP Mostly working

    Signed-off-by: Luka Govedič <[email protected]>

commit 21d7d67
Author: Luka Govedič <[email protected]>
Date:   Sat Sep 6 14:35:13 2025 -0700

    Functionalized patterns in prep for utility

    Signed-off-by: Luka Govedič <[email protected]>

Signed-off-by: ProExpertProg <[email protected]>
Comment on lines -48 to +49
STATIC_FP4_QUANT_OP = torch.ops._C.scaled_fp4_quant.default
if hasattr(torch.ops._C, "scaled_fp4_quant"):
STATIC_FP4_QUANT_OP = torch.ops._C.scaled_fp4_quant.default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this work before and not now? Should we change how this is registered?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah I think the registration might have been fixed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll punt this to a follow-up PR, in general these ops should be cleaned up

)
else:
scale = torch.empty(1, device=input.device, dtype=torch.float32)
scale = torch.empty((1, 1), device=input.device, dtype=torch.float32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? AFAIK this tensor is just a scalar to the kernel

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed for custom matching to work, (1,1) is still just one element

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? it's suspicious that it needs this change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The native implementation returns (1,1) so this just makes them consistent. I don't remember exactly what I was running into

Copy link
Collaborator

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current understanding is that when we pattern match against the torch native implementation of a custom operator, we register a pattern in Inductor using that native implementation. I'm worried that this approach might be fragile. When the torch native implementation is passed through torch.compile, various graph passes can transform it, so by the time it reaches the post-grad phase (where vLLM’s pattern matching currently happens), the structure may look different.

For example, with rms_norm, it seems we’d need to modify the implementation in a non-trivial way to make it pattern match. I don't know if this is an issue in practice, but it suggests that this scheme could unintentionally constrain how custom operators need to be authored — in ways we might not fully understand yet.

It might be more robust to preserve the custom operator as-is (i.e., avoid decomposing it into torch native ops) and then perform pattern matching directly on the custom operator itself. That would make the process less sensitive to internal graph transformations.

I did see that you wanted this in for the release. Was there a specific reason? If we are turning on the allreduce+rmsnorm fusion by default, for example, then could the fusion instead imply "+rmsnorm"?

@mergify
Copy link

mergify bot commented Oct 17, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ProExpertProg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@ProExpertProg
Copy link
Collaborator Author

The reason this is needed is it lets us do fusion without having to enable custom ops (-O.custom_ops=["+quant_fp8"]). Enabling custom ops leads to lost performance, as demonstrated in the PR description. That's because there are 4 quant ops per layer, one per matmul, and +quant_fp8 leads to lost performance for the 3 remaining ones that did not get fused onto attention, for example. In the example of SequenceParallelismPass, we're not even fusing, just moving the ops around. In AllReduceRMSNormFusion, we're going to be only fusing for some shapes, so we're again left with an inefficient implementation for others.

I agree this is a somewhat fragile approach. I would be happy to work on a "lowering" approach where we preserve the high-level structure of ops until later. The downside would be that it would require more work (I think), and we might lose access to optimizations that currently happen before our passes . But I think it wouldn't hurt Inductor in general to have a more explicit sense of converting between higher-level and lower-level representations (or we just move where our custom passes happen). We can tie this work into the "autotuning custom op implementations" like done in pytorch/pytorch#164212.

@ProExpertProg
Copy link
Collaborator Author

As discussed offline, we are going to proceed by merging this PR. After PTC, we will move our custom op matching passes to post_grad_custom_pre_pass to reduce the risk of inductor transformations breaking custom op matching. Some of our matching depends on removing noop views and slices, so we'll have to figure out a solution for that.

@BoyuanFeng
Copy link
Contributor

view/slice noop eliminations were upstreamed to PyTorch so I'm wondering if this is sufficient pytorch/pytorch#151095 pytorch/pytorch#151175

@ProExpertProg
Copy link
Collaborator Author

@BoyuanFeng wouldn't that run after post_grad_custom_pre_pass though?

Signed-off-by: Luka Govedič <[email protected]>
@mergify mergify bot removed the needs-rebase label Oct 17, 2025
@mgoin mgoin merged commit bd7157a into vllm-project:main Oct 17, 2025
87 checks passed
@ProExpertProg ProExpertProg deleted the luka/custom-op-matching-2 branch October 17, 2025 14:41
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…g utils, fix DCE bug (vllm-project#23091), fix test (vllm-project#24376), and prep for custom op matching (vllm-project#24604) (vllm-project#24542)

Signed-off-by: Luka Govedič <[email protected]>
Signed-off-by: luka <[email protected]>
Signed-off-by: Luka Govedič <[email protected]>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants