-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[torch.compile] Enable attention and allreduce fusion without custom ops enabled #24604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[torch.compile] Enable attention and allreduce fusion without custom ops enabled #24604
Conversation
This pull request has merge conflicts that must be resolved before it can be |
b374514
to
4a44829
Compare
This pull request has merge conflicts that must be resolved before it can be |
42f2231
to
a8c9181
Compare
This pull request has merge conflicts that must be resolved before it can be |
1e9326c
to
e3d0c83
Compare
e3d0c83
to
9151d01
Compare
…g utils, fix DCE bug (#23091), fix test (#24376), and prep for custom op matching (#24604) (#24542) Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: luka <[email protected]> Signed-off-by: Luka Govedič <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
9151d01
to
da3cb54
Compare
…g utils, fix DCE bug (vllm-project#23091), fix test (vllm-project#24376), and prep for custom op matching (vllm-project#24604) (vllm-project#24542) Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: luka <[email protected]> Signed-off-by: Luka Govedič <[email protected]>
Relax tolerance for L40 fusion test Signed-off-by: Luka Govedič <[email protected]> Fix NamedTuple Signed-off-by: Luka Govedič <[email protected]> Update test durations Signed-off-by: Luka Govedič <[email protected]> commit c03b29b Author: Luka Govedič <[email protected]> Date: Wed Oct 15 20:31:11 2025 -0400 Remove inductor graph partition from unit test (included in e2e tests) Signed-off-by: Luka Govedič <[email protected]> commit ae581e1 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 20:30:02 2025 -0400 Fix attention fusion test numerics Signed-off-by: Luka Govedič <[email protected]> commit a226864 Merge: e99a759 0a9ef0c Author: Luka Govedič <[email protected]> Date: Wed Oct 15 20:03:52 2025 -0400 Merge branch 'main' into luka/custom-op-matching-2 Signed-off-by: Luka Govedič <[email protected]> commit e99a759 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 19:20:47 2025 -0400 Break up B200 tests, move allreduce to H200 Signed-off-by: Luka Govedič <[email protected]> commit 876ef22 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 18:43:48 2025 -0400 Fix tests, PR feedback Signed-off-by: Luka Govedič <[email protected]> commit 6253d5b Author: Luka Govedič <[email protected]> Date: Wed Oct 15 13:18:03 2025 -0400 Add e2e to L40 distributed, move tests to start of B200 distributed Signed-off-by: Luka Govedič <[email protected]> commit de7405b Author: Luka Govedič <[email protected]> Date: Wed Oct 15 13:08:57 2025 -0400 PR comments: add _custom_op suffix Signed-off-by: Luka Govedič <[email protected]> commit 24f1298 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 13:08:13 2025 -0400 PR comments: cleanup fusion passes, & matching Signed-off-by: Luka Govedič <[email protected]> commit 7e6f5b3 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 13:06:19 2025 -0400 add flat_product example Signed-off-by: Luka Govedič <[email protected]> commit 532cbcf Author: Luka Govedič <[email protected]> Date: Wed Oct 15 12:56:07 2025 -0400 Add comment to test_logger Signed-off-by: Luka Govedič <[email protected]> commit 3943257 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 12:11:29 2025 -0400 Restore original torch.Parameter behavior in RMSNorm Signed-off-by: Luka Govedič <[email protected]> commit a3ebf0a Author: Luka Govedič <[email protected]> Date: Wed Oct 15 12:09:48 2025 -0400 fix fp8 quant tests Signed-off-by: Luka Govedič <[email protected]> commit db2b1c7 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 11:59:35 2025 -0400 Smaller model for e2e fusion test Signed-off-by: Luka Govedič <[email protected]> commit bcd95b5 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 11:54:47 2025 -0400 Fix func test Signed-off-by: Luka Govedič <[email protected]> commit bb0254a Merge: 465ce58 136a17f Author: Luka Govedič <[email protected]> Date: Wed Oct 15 10:01:15 2025 -0400 Merge branch 'main' into luka/custom-op-matching-2 # Conflicts: # tests/utils_/test_utils.py Signed-off-by: Luka Govedič <[email protected]> commit 465ce58 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 09:59:54 2025 -0400 Update tests/compile/test_fusion.py Signed-off-by: Luka Govedič <[email protected]> commit 2a6299c Author: Luka Govedič <[email protected]> Date: Wed Oct 15 04:12:01 2025 -0400 Fix e2e test patterns Signed-off-by: Luka Govedič <[email protected]> commit 8ffb474 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 03:25:26 2025 -0400 Remove/fix TODOs Signed-off-by: Luka Govedič <[email protected]> commit db16ee1 Merge: 12a7c6d f0862ea Author: Luka Govedič <[email protected]> Date: Wed Oct 15 03:10:21 2025 -0400 Merge branch 'main' into luka/custom-op-matching-2 commit 12a7c6d Author: Luka Govedič <[email protected]> Date: Wed Oct 15 03:00:52 2025 -0400 Tests & docs for flat_product Signed-off-by: Luka Govedič <[email protected]> commit 8a363d3 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 02:43:03 2025 -0400 Slight improvement for E2E fusion Signed-off-by: Luka Govedič <[email protected]> commit f6429e4 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 02:40:43 2025 -0400 Cleanup test_fusion_attn.py Signed-off-by: Luka Govedič <[email protected]> commit b5f89e5 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 02:29:06 2025 -0400 Cleanup test_full_graph.py Signed-off-by: Luka Govedič <[email protected]> commit 97b3ff2 Merge: af1ffa7 8c851f6 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 02:08:00 2025 -0400 Merge remote-tracking branch 'upstream/main' into luka/custom-op-matching-2 Signed-off-by: Luka Govedič <[email protected]> commit af1ffa7 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 01:54:18 2025 -0400 PR review Signed-off-by: Luka Govedič <[email protected]> commit 3547b87 Author: Luka Govedič <[email protected]> Date: Sun Oct 12 11:11:14 2025 -0400 fix sequence parallelism test Signed-off-by: Luka Govedič <[email protected]> commit 26892df Author: Luka Govedič <[email protected]> Date: Sun Oct 12 11:03:35 2025 -0400 fix pass manager test Signed-off-by: Luka Govedič <[email protected]> commit 0d6e550 Author: Luka Govedič <[email protected]> Date: Sun Oct 12 10:57:07 2025 -0400 fix func test Signed-off-by: Luka Govedič <[email protected]> commit 1b1a63e Author: Luka Govedič <[email protected]> Date: Sat Oct 11 14:33:46 2025 -0400 Fix e2e allreduce fusion test Signed-off-by: Luka Govedič <[email protected]> commit 52f78ce Author: Luka Govedič <[email protected]> Date: Sat Oct 11 08:38:42 2025 -0400 Add allreduce test to 2-gpu test Signed-off-by: Luka Govedič <[email protected]> commit 095277c Author: Luka Govedič <[email protected]> Date: Fri Oct 10 19:03:18 2025 -0400 Simplify matcher utils by using RMSNorm.forward_static Signed-off-by: Luka Govedič <[email protected]> commit c3264d8 Author: Luka Govedič <[email protected]> Date: Fri Oct 10 18:36:15 2025 -0400 Fix partial match rmsnorm+quant, fix allreduce+rmsnorm match Signed-off-by: Luka Govedič <[email protected]> commit a1c7fdb Author: Luka Govedič <[email protected]> Date: Fri Oct 10 16:13:42 2025 -0400 add more comprehensive testing for allreduce-rmsnorm, fix fp4 (-rmsnorm still failing) Signed-off-by: Luka Govedič <[email protected]> commit 46ee626 Author: Luka Govedič <[email protected]> Date: Fri Oct 10 13:51:13 2025 -0400 add more comprehensive testing for quantfp8 (-rmsnorm+-quant still failing) Signed-off-by: Luka Govedič <[email protected]> commit 32989d8 Author: Luka Govedič <[email protected]> Date: Fri Oct 10 13:49:09 2025 -0400 add pattern for final allreduce in model Signed-off-by: Luka Govedič <[email protected]> commit 5619bc3 Author: Luka Govedič <[email protected]> Date: Thu Oct 9 21:42:34 2025 -0400 clean up e2e tests Signed-off-by: Luka Govedič <[email protected]> commit 1756f67 Author: Luka Govedič <[email protected]> Date: Sat Oct 4 00:06:13 2025 -0400 add back fp4 Signed-off-by: Luka Govedič <[email protected]> commit c653d24 Author: Luka Govedič <[email protected]> Date: Fri Oct 3 23:39:23 2025 -0400 Fix spelling, precommit Signed-off-by: Luka Govedič <[email protected]> commit 31d0127 Author: Luka Govedič <[email protected]> Date: Fri Oct 3 13:01:13 2025 -0400 Add e2e fusions to fullgraph test (should work with Triton backend), disable without flashinfer Signed-off-by: Luka Govedič <[email protected]> commit 4dbfcf7 Author: Luka Govedič <[email protected]> Date: Fri Oct 3 11:49:24 2025 -0400 Move e2e tests to new file, add to test pipeline Signed-off-by: Luka Govedič <[email protected]> commit d3f95fe Author: Luka Govedič <[email protected]> Date: Fri Oct 3 11:38:39 2025 -0400 fullgraph allreduce test update requirements Signed-off-by: Luka Govedič <[email protected]> commit c8675ff Author: Luka Govedič <[email protected]> Date: Thu Oct 2 22:18:24 2025 -0400 log depyf folder, fix context for TestBackend, fix pattern dump Signed-off-by: Luka Govedič <[email protected]> commit d09a278 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 22:16:24 2025 -0400 allreduce fusion working with/without custom ops (with fp4) Signed-off-by: Luka Govedič <[email protected]> commit b7f52bf Author: Luka Govedič <[email protected]> Date: Thu Oct 2 22:12:04 2025 -0400 allreduce fusion working with/without custom ops (except fp4) Signed-off-by: Luka Govedič <[email protected]> commit 54189a9 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 21:24:51 2025 -0400 allreduce fusion working (custom ops on) Signed-off-by: Luka Govedič <[email protected]> commit db479ae Author: Luka Govedič <[email protected]> Date: Thu Oct 2 16:51:30 2025 -0700 TEMP allreduce fusion Signed-off-by: Luka Govedič <[email protected]> commit 5fef180 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 16:35:31 2025 -0700 clean up fullgraph tests Signed-off-by: Luka Govedič <[email protected]> commit 7eb1364 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 19:26:48 2025 -0400 Update csrc/layernorm_kernels.cu Signed-off-by: Luka Govedič <[email protected]> commit 66a35a9 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 19:26:42 2025 -0400 Update tests/compile/backend.py Signed-off-by: Luka Govedič <[email protected]> commit 21a9f9f Author: Luka Govedič <[email protected]> Date: Wed Oct 1 19:02:24 2025 -0700 Fixed tests, passing with 2.8, 2.9 tbd Signed-off-by: Luka Govedič <[email protected]> commit a2aa978 Author: Luka Govedič <[email protected]> Date: Wed Oct 1 11:21:02 2025 -0700 Test for caplog utils Signed-off-by: Luka Govedič <[email protected]> commit eb899a4 Author: Luka Govedič <[email protected]> Date: Tue Sep 30 12:55:33 2025 -0700 Temp MP workaround P3 Signed-off-by: Luka Govedič <[email protected]> commit ae7f56f Author: Luka Govedič <[email protected]> Date: Tue Sep 30 12:50:28 2025 -0700 Temp MP workaround P2 Signed-off-by: Luka Govedič <[email protected]> commit 47b4688 Author: Luka Govedič <[email protected]> Date: Sat Sep 27 07:38:52 2025 -0700 TEMP working on caplog Signed-off-by: Luka Govedič <[email protected]> commit d0b1b56 Author: Luka Govedič <[email protected]> Date: Fri Sep 26 15:39:08 2025 -0700 improve tests by adding more cases Signed-off-by: Luka Govedič <[email protected]> commit 490ac86 Author: Luka Govedič <[email protected]> Date: Fri Sep 26 13:24:01 2025 -0700 Add TP=2 test (untested) Signed-off-by: Luka Govedič <[email protected]> commit c6d6c3b Author: Luka Govedič <[email protected]> Date: Fri Sep 26 13:20:52 2025 -0700 Refactor E2E attn fusion test Signed-off-by: Luka Govedič <[email protected]> commit 141a37e Author: Luka Govedič <[email protected]> Date: Fri Sep 26 07:41:41 2025 -0700 Fix rmsnorm Signed-off-by: Luka Govedič <[email protected]> commit cdd1529 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 17:18:43 2025 -0700 Flat product for better test names/visibility Signed-off-by: Luka Govedič <[email protected]> commit d843a67 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 17:02:14 2025 -0700 Add triton attn test to attn+quant fusion Signed-off-by: Luka Govedič <[email protected]> commit 1277999 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 16:12:23 2025 -0700 Remove V0 attn fusion test Signed-off-by: Luka Govedič <[email protected]> commit 77835fd Author: Luka Govedič <[email protected]> Date: Thu Sep 25 16:12:11 2025 -0700 Attention fusion works with custom ops Signed-off-by: Luka Govedič <[email protected]> commit 1ae80c6 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 16:02:21 2025 -0700 Move global vllm_config to pass manager Signed-off-by: Luka Govedič <[email protected]> commit b172747 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 15:02:33 2025 -0700 Functionalize attn+quant patterns Signed-off-by: Luka Govedič <[email protected]> commit d96913a Author: Luka Govedič <[email protected]> Date: Thu Sep 25 16:06:25 2025 -0400 Cleanup test_fusion.py, added extra layer of rms/quant Signed-off-by: Luka Govedič <[email protected]> commit e6b394e Author: Luka Govedič <[email protected]> Date: Fri Sep 19 19:00:27 2025 -0700 Add TODO Signed-off-by: Luka Govedič <[email protected]> commit 05a65f3 Author: Luka Govedič <[email protected]> Date: Thu Sep 18 13:21:46 2025 -0700 ALL WORKS Signed-off-by: Luka Govedič <[email protected]> commit 14fdc8b Author: Luka Govedič <[email protected]> Date: Thu Sep 18 12:32:27 2025 -0700 quant with fix for pure torch, broke others Signed-off-by: Luka Govedič <[email protected]> commit e151e6d Author: Luka Govedič <[email protected]> Date: Tue Sep 16 11:08:39 2025 -0700 quant works except (torch,torch) Signed-off-by: Luka Govedič <[email protected]> commit 8e4a56f Author: Luka Govedič <[email protected]> Date: Tue Sep 16 10:47:13 2025 -0700 rms works fully now, had to remove more conversions (and add them in replacements). TODO pass to remove unnecessary conversions? Signed-off-by: Luka Govedič <[email protected]> commit cdad3c0 Author: Luka Govedič <[email protected]> Date: Fri Sep 12 12:11:48 2025 -0700 TEMP: fixed rmsnorm issue (TODO assert dtypes in fused norm_quant kernels) Signed-off-by: Luka Govedič <[email protected]> commit f3b4cf1 Author: Luka Govedič <[email protected]> Date: Tue Sep 9 09:48:53 2025 -0700 TEMP Mostly working Signed-off-by: Luka Govedič <[email protected]> commit 21d7d67 Author: Luka Govedič <[email protected]> Date: Sat Sep 6 14:35:13 2025 -0700 Functionalized patterns in prep for utility Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: ProExpertProg <[email protected]>
Relax tolerance for L40 fusion test Signed-off-by: Luka Govedič <[email protected]> Fix NamedTuple Signed-off-by: Luka Govedič <[email protected]> Update test durations Signed-off-by: Luka Govedič <[email protected]> commit c03b29b Author: Luka Govedič <[email protected]> Date: Wed Oct 15 20:31:11 2025 -0400 Remove inductor graph partition from unit test (included in e2e tests) Signed-off-by: Luka Govedič <[email protected]> commit ae581e1 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 20:30:02 2025 -0400 Fix attention fusion test numerics Signed-off-by: Luka Govedič <[email protected]> commit a226864 Merge: e99a759 0a9ef0c Author: Luka Govedič <[email protected]> Date: Wed Oct 15 20:03:52 2025 -0400 Merge branch 'main' into luka/custom-op-matching-2 Signed-off-by: Luka Govedič <[email protected]> commit e99a759 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 19:20:47 2025 -0400 Break up B200 tests, move allreduce to H200 Signed-off-by: Luka Govedič <[email protected]> commit 876ef22 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 18:43:48 2025 -0400 Fix tests, PR feedback Signed-off-by: Luka Govedič <[email protected]> commit 6253d5b Author: Luka Govedič <[email protected]> Date: Wed Oct 15 13:18:03 2025 -0400 Add e2e to L40 distributed, move tests to start of B200 distributed Signed-off-by: Luka Govedič <[email protected]> commit de7405b Author: Luka Govedič <[email protected]> Date: Wed Oct 15 13:08:57 2025 -0400 PR comments: add _custom_op suffix Signed-off-by: Luka Govedič <[email protected]> commit 24f1298 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 13:08:13 2025 -0400 PR comments: cleanup fusion passes, & matching Signed-off-by: Luka Govedič <[email protected]> commit 7e6f5b3 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 13:06:19 2025 -0400 add flat_product example Signed-off-by: Luka Govedič <[email protected]> commit 532cbcf Author: Luka Govedič <[email protected]> Date: Wed Oct 15 12:56:07 2025 -0400 Add comment to test_logger Signed-off-by: Luka Govedič <[email protected]> commit 3943257 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 12:11:29 2025 -0400 Restore original torch.Parameter behavior in RMSNorm Signed-off-by: Luka Govedič <[email protected]> commit a3ebf0a Author: Luka Govedič <[email protected]> Date: Wed Oct 15 12:09:48 2025 -0400 fix fp8 quant tests Signed-off-by: Luka Govedič <[email protected]> commit db2b1c7 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 11:59:35 2025 -0400 Smaller model for e2e fusion test Signed-off-by: Luka Govedič <[email protected]> commit bcd95b5 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 11:54:47 2025 -0400 Fix func test Signed-off-by: Luka Govedič <[email protected]> commit bb0254a Merge: 465ce58 136a17f Author: Luka Govedič <[email protected]> Date: Wed Oct 15 10:01:15 2025 -0400 Merge branch 'main' into luka/custom-op-matching-2 # Conflicts: # tests/utils_/test_utils.py Signed-off-by: Luka Govedič <[email protected]> commit 465ce58 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 09:59:54 2025 -0400 Update tests/compile/test_fusion.py Signed-off-by: Luka Govedič <[email protected]> commit 2a6299c Author: Luka Govedič <[email protected]> Date: Wed Oct 15 04:12:01 2025 -0400 Fix e2e test patterns Signed-off-by: Luka Govedič <[email protected]> commit 8ffb474 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 03:25:26 2025 -0400 Remove/fix TODOs Signed-off-by: Luka Govedič <[email protected]> commit db16ee1 Merge: 12a7c6d f0862ea Author: Luka Govedič <[email protected]> Date: Wed Oct 15 03:10:21 2025 -0400 Merge branch 'main' into luka/custom-op-matching-2 commit 12a7c6d Author: Luka Govedič <[email protected]> Date: Wed Oct 15 03:00:52 2025 -0400 Tests & docs for flat_product Signed-off-by: Luka Govedič <[email protected]> commit 8a363d3 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 02:43:03 2025 -0400 Slight improvement for E2E fusion Signed-off-by: Luka Govedič <[email protected]> commit f6429e4 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 02:40:43 2025 -0400 Cleanup test_fusion_attn.py Signed-off-by: Luka Govedič <[email protected]> commit b5f89e5 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 02:29:06 2025 -0400 Cleanup test_full_graph.py Signed-off-by: Luka Govedič <[email protected]> commit 97b3ff2 Merge: af1ffa7 8c851f6 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 02:08:00 2025 -0400 Merge remote-tracking branch 'upstream/main' into luka/custom-op-matching-2 Signed-off-by: Luka Govedič <[email protected]> commit af1ffa7 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 01:54:18 2025 -0400 PR review Signed-off-by: Luka Govedič <[email protected]> commit 3547b87 Author: Luka Govedič <[email protected]> Date: Sun Oct 12 11:11:14 2025 -0400 fix sequence parallelism test Signed-off-by: Luka Govedič <[email protected]> commit 26892df Author: Luka Govedič <[email protected]> Date: Sun Oct 12 11:03:35 2025 -0400 fix pass manager test Signed-off-by: Luka Govedič <[email protected]> commit 0d6e550 Author: Luka Govedič <[email protected]> Date: Sun Oct 12 10:57:07 2025 -0400 fix func test Signed-off-by: Luka Govedič <[email protected]> commit 1b1a63e Author: Luka Govedič <[email protected]> Date: Sat Oct 11 14:33:46 2025 -0400 Fix e2e allreduce fusion test Signed-off-by: Luka Govedič <[email protected]> commit 52f78ce Author: Luka Govedič <[email protected]> Date: Sat Oct 11 08:38:42 2025 -0400 Add allreduce test to 2-gpu test Signed-off-by: Luka Govedič <[email protected]> commit 095277c Author: Luka Govedič <[email protected]> Date: Fri Oct 10 19:03:18 2025 -0400 Simplify matcher utils by using RMSNorm.forward_static Signed-off-by: Luka Govedič <[email protected]> commit c3264d8 Author: Luka Govedič <[email protected]> Date: Fri Oct 10 18:36:15 2025 -0400 Fix partial match rmsnorm+quant, fix allreduce+rmsnorm match Signed-off-by: Luka Govedič <[email protected]> commit a1c7fdb Author: Luka Govedič <[email protected]> Date: Fri Oct 10 16:13:42 2025 -0400 add more comprehensive testing for allreduce-rmsnorm, fix fp4 (-rmsnorm still failing) Signed-off-by: Luka Govedič <[email protected]> commit 46ee626 Author: Luka Govedič <[email protected]> Date: Fri Oct 10 13:51:13 2025 -0400 add more comprehensive testing for quantfp8 (-rmsnorm+-quant still failing) Signed-off-by: Luka Govedič <[email protected]> commit 32989d8 Author: Luka Govedič <[email protected]> Date: Fri Oct 10 13:49:09 2025 -0400 add pattern for final allreduce in model Signed-off-by: Luka Govedič <[email protected]> commit 5619bc3 Author: Luka Govedič <[email protected]> Date: Thu Oct 9 21:42:34 2025 -0400 clean up e2e tests Signed-off-by: Luka Govedič <[email protected]> commit 1756f67 Author: Luka Govedič <[email protected]> Date: Sat Oct 4 00:06:13 2025 -0400 add back fp4 Signed-off-by: Luka Govedič <[email protected]> commit c653d24 Author: Luka Govedič <[email protected]> Date: Fri Oct 3 23:39:23 2025 -0400 Fix spelling, precommit Signed-off-by: Luka Govedič <[email protected]> commit 31d0127 Author: Luka Govedič <[email protected]> Date: Fri Oct 3 13:01:13 2025 -0400 Add e2e fusions to fullgraph test (should work with Triton backend), disable without flashinfer Signed-off-by: Luka Govedič <[email protected]> commit 4dbfcf7 Author: Luka Govedič <[email protected]> Date: Fri Oct 3 11:49:24 2025 -0400 Move e2e tests to new file, add to test pipeline Signed-off-by: Luka Govedič <[email protected]> commit d3f95fe Author: Luka Govedič <[email protected]> Date: Fri Oct 3 11:38:39 2025 -0400 fullgraph allreduce test update requirements Signed-off-by: Luka Govedič <[email protected]> commit c8675ff Author: Luka Govedič <[email protected]> Date: Thu Oct 2 22:18:24 2025 -0400 log depyf folder, fix context for TestBackend, fix pattern dump Signed-off-by: Luka Govedič <[email protected]> commit d09a278 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 22:16:24 2025 -0400 allreduce fusion working with/without custom ops (with fp4) Signed-off-by: Luka Govedič <[email protected]> commit b7f52bf Author: Luka Govedič <[email protected]> Date: Thu Oct 2 22:12:04 2025 -0400 allreduce fusion working with/without custom ops (except fp4) Signed-off-by: Luka Govedič <[email protected]> commit 54189a9 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 21:24:51 2025 -0400 allreduce fusion working (custom ops on) Signed-off-by: Luka Govedič <[email protected]> commit db479ae Author: Luka Govedič <[email protected]> Date: Thu Oct 2 16:51:30 2025 -0700 TEMP allreduce fusion Signed-off-by: Luka Govedič <[email protected]> commit 5fef180 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 16:35:31 2025 -0700 clean up fullgraph tests Signed-off-by: Luka Govedič <[email protected]> commit 7eb1364 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 19:26:48 2025 -0400 Update csrc/layernorm_kernels.cu Signed-off-by: Luka Govedič <[email protected]> commit 66a35a9 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 19:26:42 2025 -0400 Update tests/compile/backend.py Signed-off-by: Luka Govedič <[email protected]> commit 21a9f9f Author: Luka Govedič <[email protected]> Date: Wed Oct 1 19:02:24 2025 -0700 Fixed tests, passing with 2.8, 2.9 tbd Signed-off-by: Luka Govedič <[email protected]> commit a2aa978 Author: Luka Govedič <[email protected]> Date: Wed Oct 1 11:21:02 2025 -0700 Test for caplog utils Signed-off-by: Luka Govedič <[email protected]> commit eb899a4 Author: Luka Govedič <[email protected]> Date: Tue Sep 30 12:55:33 2025 -0700 Temp MP workaround P3 Signed-off-by: Luka Govedič <[email protected]> commit ae7f56f Author: Luka Govedič <[email protected]> Date: Tue Sep 30 12:50:28 2025 -0700 Temp MP workaround P2 Signed-off-by: Luka Govedič <[email protected]> commit 47b4688 Author: Luka Govedič <[email protected]> Date: Sat Sep 27 07:38:52 2025 -0700 TEMP working on caplog Signed-off-by: Luka Govedič <[email protected]> commit d0b1b56 Author: Luka Govedič <[email protected]> Date: Fri Sep 26 15:39:08 2025 -0700 improve tests by adding more cases Signed-off-by: Luka Govedič <[email protected]> commit 490ac86 Author: Luka Govedič <[email protected]> Date: Fri Sep 26 13:24:01 2025 -0700 Add TP=2 test (untested) Signed-off-by: Luka Govedič <[email protected]> commit c6d6c3b Author: Luka Govedič <[email protected]> Date: Fri Sep 26 13:20:52 2025 -0700 Refactor E2E attn fusion test Signed-off-by: Luka Govedič <[email protected]> commit 141a37e Author: Luka Govedič <[email protected]> Date: Fri Sep 26 07:41:41 2025 -0700 Fix rmsnorm Signed-off-by: Luka Govedič <[email protected]> commit cdd1529 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 17:18:43 2025 -0700 Flat product for better test names/visibility Signed-off-by: Luka Govedič <[email protected]> commit d843a67 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 17:02:14 2025 -0700 Add triton attn test to attn+quant fusion Signed-off-by: Luka Govedič <[email protected]> commit 1277999 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 16:12:23 2025 -0700 Remove V0 attn fusion test Signed-off-by: Luka Govedič <[email protected]> commit 77835fd Author: Luka Govedič <[email protected]> Date: Thu Sep 25 16:12:11 2025 -0700 Attention fusion works with custom ops Signed-off-by: Luka Govedič <[email protected]> commit 1ae80c6 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 16:02:21 2025 -0700 Move global vllm_config to pass manager Signed-off-by: Luka Govedič <[email protected]> commit b172747 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 15:02:33 2025 -0700 Functionalize attn+quant patterns Signed-off-by: Luka Govedič <[email protected]> commit d96913a Author: Luka Govedič <[email protected]> Date: Thu Sep 25 16:06:25 2025 -0400 Cleanup test_fusion.py, added extra layer of rms/quant Signed-off-by: Luka Govedič <[email protected]> commit e6b394e Author: Luka Govedič <[email protected]> Date: Fri Sep 19 19:00:27 2025 -0700 Add TODO Signed-off-by: Luka Govedič <[email protected]> commit 05a65f3 Author: Luka Govedič <[email protected]> Date: Thu Sep 18 13:21:46 2025 -0700 ALL WORKS Signed-off-by: Luka Govedič <[email protected]> commit 14fdc8b Author: Luka Govedič <[email protected]> Date: Thu Sep 18 12:32:27 2025 -0700 quant with fix for pure torch, broke others Signed-off-by: Luka Govedič <[email protected]> commit e151e6d Author: Luka Govedič <[email protected]> Date: Tue Sep 16 11:08:39 2025 -0700 quant works except (torch,torch) Signed-off-by: Luka Govedič <[email protected]> commit 8e4a56f Author: Luka Govedič <[email protected]> Date: Tue Sep 16 10:47:13 2025 -0700 rms works fully now, had to remove more conversions (and add them in replacements). TODO pass to remove unnecessary conversions? Signed-off-by: Luka Govedič <[email protected]> commit cdad3c0 Author: Luka Govedič <[email protected]> Date: Fri Sep 12 12:11:48 2025 -0700 TEMP: fixed rmsnorm issue (TODO assert dtypes in fused norm_quant kernels) Signed-off-by: Luka Govedič <[email protected]> commit f3b4cf1 Author: Luka Govedič <[email protected]> Date: Tue Sep 9 09:48:53 2025 -0700 TEMP Mostly working Signed-off-by: Luka Govedič <[email protected]> commit 21d7d67 Author: Luka Govedič <[email protected]> Date: Sat Sep 6 14:35:13 2025 -0700 Functionalized patterns in prep for utility Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: ProExpertProg <[email protected]>
Signed-off-by: Luka Govedič <[email protected]>
commit e34d36d Author: Luka Govedič <[email protected]> Date: Thu Oct 16 09:33:16 2025 -0400 More tweaking of precision Signed-off-by: Luka Govedič <[email protected]> commit 6319e39 Author: Luka Govedič <[email protected]> Date: Thu Oct 16 00:58:59 2025 -0400 Update test durations Signed-off-by: Luka Govedič <[email protected]> commit d4fe977 Author: Luka Govedič <[email protected]> Date: Thu Oct 16 00:54:25 2025 -0400 Fix NamedTuple Signed-off-by: Luka Govedič <[email protected]> commit 65ef5fd Merge: d2e0489 785d8b6 Author: Luka Govedič <[email protected]> Date: Thu Oct 16 00:38:26 2025 -0400 Merge branch 'main' into luka/custom-op-matching-2 commit d2e0489 Author: Luka Govedič <[email protected]> Date: Thu Oct 16 00:31:15 2025 -0400 Relax tolerance for L40 fusion test Signed-off-by: Luka Govedič <[email protected]> commit c03b29b Author: Luka Govedič <[email protected]> Date: Wed Oct 15 20:31:11 2025 -0400 Remove inductor graph partition from unit test (included in e2e tests) Signed-off-by: Luka Govedič <[email protected]> commit ae581e1 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 20:30:02 2025 -0400 Fix attention fusion test numerics Signed-off-by: Luka Govedič <[email protected]> commit a226864 Merge: e99a759 0a9ef0c Author: Luka Govedič <[email protected]> Date: Wed Oct 15 20:03:52 2025 -0400 Merge branch 'main' into luka/custom-op-matching-2 Signed-off-by: Luka Govedič <[email protected]> commit e99a759 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 19:20:47 2025 -0400 Break up B200 tests, move allreduce to H200 Signed-off-by: Luka Govedič <[email protected]> commit 876ef22 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 18:43:48 2025 -0400 Fix tests, PR feedback Signed-off-by: Luka Govedič <[email protected]> commit 6253d5b Author: Luka Govedič <[email protected]> Date: Wed Oct 15 13:18:03 2025 -0400 Add e2e to L40 distributed, move tests to start of B200 distributed Signed-off-by: Luka Govedič <[email protected]> commit de7405b Author: Luka Govedič <[email protected]> Date: Wed Oct 15 13:08:57 2025 -0400 PR comments: add _custom_op suffix Signed-off-by: Luka Govedič <[email protected]> commit 24f1298 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 13:08:13 2025 -0400 PR comments: cleanup fusion passes, & matching Signed-off-by: Luka Govedič <[email protected]> commit 7e6f5b3 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 13:06:19 2025 -0400 add flat_product example Signed-off-by: Luka Govedič <[email protected]> commit 532cbcf Author: Luka Govedič <[email protected]> Date: Wed Oct 15 12:56:07 2025 -0400 Add comment to test_logger Signed-off-by: Luka Govedič <[email protected]> commit 3943257 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 12:11:29 2025 -0400 Restore original torch.Parameter behavior in RMSNorm Signed-off-by: Luka Govedič <[email protected]> commit a3ebf0a Author: Luka Govedič <[email protected]> Date: Wed Oct 15 12:09:48 2025 -0400 fix fp8 quant tests Signed-off-by: Luka Govedič <[email protected]> commit db2b1c7 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 11:59:35 2025 -0400 Smaller model for e2e fusion test Signed-off-by: Luka Govedič <[email protected]> commit bcd95b5 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 11:54:47 2025 -0400 Fix func test Signed-off-by: Luka Govedič <[email protected]> commit bb0254a Merge: 465ce58 136a17f Author: Luka Govedič <[email protected]> Date: Wed Oct 15 10:01:15 2025 -0400 Merge branch 'main' into luka/custom-op-matching-2 # Conflicts: # tests/utils_/test_utils.py Signed-off-by: Luka Govedič <[email protected]> commit 465ce58 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 09:59:54 2025 -0400 Update tests/compile/test_fusion.py Signed-off-by: Luka Govedič <[email protected]> commit 2a6299c Author: Luka Govedič <[email protected]> Date: Wed Oct 15 04:12:01 2025 -0400 Fix e2e test patterns Signed-off-by: Luka Govedič <[email protected]> commit 8ffb474 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 03:25:26 2025 -0400 Remove/fix TODOs Signed-off-by: Luka Govedič <[email protected]> commit db16ee1 Merge: 12a7c6d f0862ea Author: Luka Govedič <[email protected]> Date: Wed Oct 15 03:10:21 2025 -0400 Merge branch 'main' into luka/custom-op-matching-2 commit 12a7c6d Author: Luka Govedič <[email protected]> Date: Wed Oct 15 03:00:52 2025 -0400 Tests & docs for flat_product Signed-off-by: Luka Govedič <[email protected]> commit 8a363d3 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 02:43:03 2025 -0400 Slight improvement for E2E fusion Signed-off-by: Luka Govedič <[email protected]> commit f6429e4 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 02:40:43 2025 -0400 Cleanup test_fusion_attn.py Signed-off-by: Luka Govedič <[email protected]> commit b5f89e5 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 02:29:06 2025 -0400 Cleanup test_full_graph.py Signed-off-by: Luka Govedič <[email protected]> commit 97b3ff2 Merge: af1ffa7 8c851f6 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 02:08:00 2025 -0400 Merge remote-tracking branch 'upstream/main' into luka/custom-op-matching-2 Signed-off-by: Luka Govedič <[email protected]> commit af1ffa7 Author: Luka Govedič <[email protected]> Date: Wed Oct 15 01:54:18 2025 -0400 PR review Signed-off-by: Luka Govedič <[email protected]> commit 3547b87 Author: Luka Govedič <[email protected]> Date: Sun Oct 12 11:11:14 2025 -0400 fix sequence parallelism test Signed-off-by: Luka Govedič <[email protected]> commit 26892df Author: Luka Govedič <[email protected]> Date: Sun Oct 12 11:03:35 2025 -0400 fix pass manager test Signed-off-by: Luka Govedič <[email protected]> commit 0d6e550 Author: Luka Govedič <[email protected]> Date: Sun Oct 12 10:57:07 2025 -0400 fix func test Signed-off-by: Luka Govedič <[email protected]> commit 1b1a63e Author: Luka Govedič <[email protected]> Date: Sat Oct 11 14:33:46 2025 -0400 Fix e2e allreduce fusion test Signed-off-by: Luka Govedič <[email protected]> commit 52f78ce Author: Luka Govedič <[email protected]> Date: Sat Oct 11 08:38:42 2025 -0400 Add allreduce test to 2-gpu test Signed-off-by: Luka Govedič <[email protected]> commit 095277c Author: Luka Govedič <[email protected]> Date: Fri Oct 10 19:03:18 2025 -0400 Simplify matcher utils by using RMSNorm.forward_static Signed-off-by: Luka Govedič <[email protected]> commit c3264d8 Author: Luka Govedič <[email protected]> Date: Fri Oct 10 18:36:15 2025 -0400 Fix partial match rmsnorm+quant, fix allreduce+rmsnorm match Signed-off-by: Luka Govedič <[email protected]> commit a1c7fdb Author: Luka Govedič <[email protected]> Date: Fri Oct 10 16:13:42 2025 -0400 add more comprehensive testing for allreduce-rmsnorm, fix fp4 (-rmsnorm still failing) Signed-off-by: Luka Govedič <[email protected]> commit 46ee626 Author: Luka Govedič <[email protected]> Date: Fri Oct 10 13:51:13 2025 -0400 add more comprehensive testing for quantfp8 (-rmsnorm+-quant still failing) Signed-off-by: Luka Govedič <[email protected]> commit 32989d8 Author: Luka Govedič <[email protected]> Date: Fri Oct 10 13:49:09 2025 -0400 add pattern for final allreduce in model Signed-off-by: Luka Govedič <[email protected]> commit 5619bc3 Author: Luka Govedič <[email protected]> Date: Thu Oct 9 21:42:34 2025 -0400 clean up e2e tests Signed-off-by: Luka Govedič <[email protected]> commit 1756f67 Author: Luka Govedič <[email protected]> Date: Sat Oct 4 00:06:13 2025 -0400 add back fp4 Signed-off-by: Luka Govedič <[email protected]> commit c653d24 Author: Luka Govedič <[email protected]> Date: Fri Oct 3 23:39:23 2025 -0400 Fix spelling, precommit Signed-off-by: Luka Govedič <[email protected]> commit 31d0127 Author: Luka Govedič <[email protected]> Date: Fri Oct 3 13:01:13 2025 -0400 Add e2e fusions to fullgraph test (should work with Triton backend), disable without flashinfer Signed-off-by: Luka Govedič <[email protected]> commit 4dbfcf7 Author: Luka Govedič <[email protected]> Date: Fri Oct 3 11:49:24 2025 -0400 Move e2e tests to new file, add to test pipeline Signed-off-by: Luka Govedič <[email protected]> commit d3f95fe Author: Luka Govedič <[email protected]> Date: Fri Oct 3 11:38:39 2025 -0400 fullgraph allreduce test update requirements Signed-off-by: Luka Govedič <[email protected]> commit c8675ff Author: Luka Govedič <[email protected]> Date: Thu Oct 2 22:18:24 2025 -0400 log depyf folder, fix context for TestBackend, fix pattern dump Signed-off-by: Luka Govedič <[email protected]> commit d09a278 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 22:16:24 2025 -0400 allreduce fusion working with/without custom ops (with fp4) Signed-off-by: Luka Govedič <[email protected]> commit b7f52bf Author: Luka Govedič <[email protected]> Date: Thu Oct 2 22:12:04 2025 -0400 allreduce fusion working with/without custom ops (except fp4) Signed-off-by: Luka Govedič <[email protected]> commit 54189a9 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 21:24:51 2025 -0400 allreduce fusion working (custom ops on) Signed-off-by: Luka Govedič <[email protected]> commit db479ae Author: Luka Govedič <[email protected]> Date: Thu Oct 2 16:51:30 2025 -0700 TEMP allreduce fusion Signed-off-by: Luka Govedič <[email protected]> commit 5fef180 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 16:35:31 2025 -0700 clean up fullgraph tests Signed-off-by: Luka Govedič <[email protected]> commit 7eb1364 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 19:26:48 2025 -0400 Update csrc/layernorm_kernels.cu Signed-off-by: Luka Govedič <[email protected]> commit 66a35a9 Author: Luka Govedič <[email protected]> Date: Thu Oct 2 19:26:42 2025 -0400 Update tests/compile/backend.py Signed-off-by: Luka Govedič <[email protected]> commit 21a9f9f Author: Luka Govedič <[email protected]> Date: Wed Oct 1 19:02:24 2025 -0700 Fixed tests, passing with 2.8, 2.9 tbd Signed-off-by: Luka Govedič <[email protected]> commit a2aa978 Author: Luka Govedič <[email protected]> Date: Wed Oct 1 11:21:02 2025 -0700 Test for caplog utils Signed-off-by: Luka Govedič <[email protected]> commit eb899a4 Author: Luka Govedič <[email protected]> Date: Tue Sep 30 12:55:33 2025 -0700 Temp MP workaround P3 Signed-off-by: Luka Govedič <[email protected]> commit ae7f56f Author: Luka Govedič <[email protected]> Date: Tue Sep 30 12:50:28 2025 -0700 Temp MP workaround P2 Signed-off-by: Luka Govedič <[email protected]> commit 47b4688 Author: Luka Govedič <[email protected]> Date: Sat Sep 27 07:38:52 2025 -0700 TEMP working on caplog Signed-off-by: Luka Govedič <[email protected]> commit d0b1b56 Author: Luka Govedič <[email protected]> Date: Fri Sep 26 15:39:08 2025 -0700 improve tests by adding more cases Signed-off-by: Luka Govedič <[email protected]> commit 490ac86 Author: Luka Govedič <[email protected]> Date: Fri Sep 26 13:24:01 2025 -0700 Add TP=2 test (untested) Signed-off-by: Luka Govedič <[email protected]> commit c6d6c3b Author: Luka Govedič <[email protected]> Date: Fri Sep 26 13:20:52 2025 -0700 Refactor E2E attn fusion test Signed-off-by: Luka Govedič <[email protected]> commit 141a37e Author: Luka Govedič <[email protected]> Date: Fri Sep 26 07:41:41 2025 -0700 Fix rmsnorm Signed-off-by: Luka Govedič <[email protected]> commit cdd1529 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 17:18:43 2025 -0700 Flat product for better test names/visibility Signed-off-by: Luka Govedič <[email protected]> commit d843a67 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 17:02:14 2025 -0700 Add triton attn test to attn+quant fusion Signed-off-by: Luka Govedič <[email protected]> commit 1277999 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 16:12:23 2025 -0700 Remove V0 attn fusion test Signed-off-by: Luka Govedič <[email protected]> commit 77835fd Author: Luka Govedič <[email protected]> Date: Thu Sep 25 16:12:11 2025 -0700 Attention fusion works with custom ops Signed-off-by: Luka Govedič <[email protected]> commit 1ae80c6 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 16:02:21 2025 -0700 Move global vllm_config to pass manager Signed-off-by: Luka Govedič <[email protected]> commit b172747 Author: Luka Govedič <[email protected]> Date: Thu Sep 25 15:02:33 2025 -0700 Functionalize attn+quant patterns Signed-off-by: Luka Govedič <[email protected]> commit d96913a Author: Luka Govedič <[email protected]> Date: Thu Sep 25 16:06:25 2025 -0400 Cleanup test_fusion.py, added extra layer of rms/quant Signed-off-by: Luka Govedič <[email protected]> commit e6b394e Author: Luka Govedič <[email protected]> Date: Fri Sep 19 19:00:27 2025 -0700 Add TODO Signed-off-by: Luka Govedič <[email protected]> commit 05a65f3 Author: Luka Govedič <[email protected]> Date: Thu Sep 18 13:21:46 2025 -0700 ALL WORKS Signed-off-by: Luka Govedič <[email protected]> commit 14fdc8b Author: Luka Govedič <[email protected]> Date: Thu Sep 18 12:32:27 2025 -0700 quant with fix for pure torch, broke others Signed-off-by: Luka Govedič <[email protected]> commit e151e6d Author: Luka Govedič <[email protected]> Date: Tue Sep 16 11:08:39 2025 -0700 quant works except (torch,torch) Signed-off-by: Luka Govedič <[email protected]> commit 8e4a56f Author: Luka Govedič <[email protected]> Date: Tue Sep 16 10:47:13 2025 -0700 rms works fully now, had to remove more conversions (and add them in replacements). TODO pass to remove unnecessary conversions? Signed-off-by: Luka Govedič <[email protected]> commit cdad3c0 Author: Luka Govedič <[email protected]> Date: Fri Sep 12 12:11:48 2025 -0700 TEMP: fixed rmsnorm issue (TODO assert dtypes in fused norm_quant kernels) Signed-off-by: Luka Govedič <[email protected]> commit f3b4cf1 Author: Luka Govedič <[email protected]> Date: Tue Sep 9 09:48:53 2025 -0700 TEMP Mostly working Signed-off-by: Luka Govedič <[email protected]> commit 21d7d67 Author: Luka Govedič <[email protected]> Date: Sat Sep 6 14:35:13 2025 -0700 Functionalized patterns in prep for utility Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: ProExpertProg <[email protected]>
STATIC_FP4_QUANT_OP = torch.ops._C.scaled_fp4_quant.default | ||
if hasattr(torch.ops._C, "scaled_fp4_quant"): | ||
STATIC_FP4_QUANT_OP = torch.ops._C.scaled_fp4_quant.default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this work before and not now? Should we change how this is registered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah I think the registration might have been fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll punt this to a follow-up PR, in general these ops should be cleaned up
) | ||
else: | ||
scale = torch.empty(1, device=input.device, dtype=torch.float32) | ||
scale = torch.empty((1, 1), device=input.device, dtype=torch.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? AFAIK this tensor is just a scalar to the kernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed for custom matching to work, (1,1) is still just one element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate? it's suspicious that it needs this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The native implementation returns (1,1) so this just makes them consistent. I don't remember exactly what I was running into
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current understanding is that when we pattern match against the torch native implementation of a custom operator, we register a pattern in Inductor using that native implementation. I'm worried that this approach might be fragile. When the torch native implementation is passed through torch.compile, various graph passes can transform it, so by the time it reaches the post-grad phase (where vLLM’s pattern matching currently happens), the structure may look different.
For example, with rms_norm, it seems we’d need to modify the implementation in a non-trivial way to make it pattern match. I don't know if this is an issue in practice, but it suggests that this scheme could unintentionally constrain how custom operators need to be authored — in ways we might not fully understand yet.
It might be more robust to preserve the custom operator as-is (i.e., avoid decomposing it into torch native ops) and then perform pattern matching directly on the custom operator itself. That would make the process less sensitive to internal graph transformations.
I did see that you wanted this in for the release. Was there a specific reason? If we are turning on the allreduce+rmsnorm fusion by default, for example, then could the fusion instead imply "+rmsnorm"?
This pull request has merge conflicts that must be resolved before it can be |
The reason this is needed is it lets us do fusion without having to enable custom ops (-O.custom_ops=["+quant_fp8"]). Enabling custom ops leads to lost performance, as demonstrated in the PR description. That's because there are 4 quant ops per layer, one per matmul, and I agree this is a somewhat fragile approach. I would be happy to work on a "lowering" approach where we preserve the high-level structure of ops until later. The downside would be that it would require more work (I think), and we might lose access to optimizations that currently happen before our passes . But I think it wouldn't hurt Inductor in general to have a more explicit sense of converting between higher-level and lower-level representations (or we just move where our custom passes happen). We can tie this work into the "autotuning custom op implementations" like done in pytorch/pytorch#164212. |
As discussed offline, we are going to proceed by merging this PR. After PTC, we will move our custom op matching passes to |
view/slice noop eliminations were upstreamed to PyTorch so I'm wondering if this is sufficient pytorch/pytorch#151095 pytorch/pytorch#151175 |
…hing-2 Signed-off-by: Luka Govedič <[email protected]>
@BoyuanFeng wouldn't that run after |
Signed-off-by: Luka Govedič <[email protected]>
…g utils, fix DCE bug (vllm-project#23091), fix test (vllm-project#24376), and prep for custom op matching (vllm-project#24604) (vllm-project#24542) Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: luka <[email protected]> Signed-off-by: Luka Govedič <[email protected]>
…ops enabled (vllm-project#24604) Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: Luka Govedič <[email protected]>
Purpose
This PR enables matching the torch implementations of custom ops QuantFP8 and RMSNorm. On
main
, fusion currently requires enabling custom ops, but they are slower than their torch counterparts, so the benefit of custom fusion passes is reduced.We add a bunch of "matcher util" objects which can be called in patterns and get traced to the same fx nodes as the custom op they correspond to in both enabled and disabled form automatically.
This PR also adds additional debugging utilities and adds E2E fusion tests to verify fusions happen in models end-to-end instead of just in unit tests.
Test Plan
Unit tests, added more fusion E2E tests.
Test Result
Tests all pass
Performance numbers
Below are B200 numbers (with flashinfer) from
vllm bench serve
on the following serve command:We test the following regimes with corresponding additional arguments:
none
:-O.custom_ops='["none"]' -O.pass_config={"enable_fi_allreduce_fusion":false,"enable_attn_fusion":false,"enable_noop":true}
none_fusion_attention
:-O.custom_ops='["none"]' -O.pass_config={"enable_fi_allreduce_fusion":false,"enable_attn_fusion":true,"enable_noop":true}
none_fusion_attention_allreduce
:-O.custom_ops='["none"]' -O.pass_config={"enable_fi_allreduce_fusion":true,"enable_attn_fusion":true,"enable_noop":true}
rms_quant
:-O.custom_ops='["none", "+quant_fp8", "+rms_norm"]' -O.pass_config={"enable_fi_allreduce_fusion":false,"enable_attn_fusion":false,"enable_noop":true}
rms_quant_fusion_attention
:-O.custom_ops='["none", "+quant_fp8", "+rms_norm"]' -O.pass_config={"enable_fi_allreduce_fusion":false,"enable_attn_fusion":true,"enable_noop":true}
rms_quant_fusion_attention_allreduce
:-O.custom_ops='["none", "+quant_fp8", "+rms_norm"]' -O.pass_config={"enable_fi_allreduce_fusion":true,"enable_attn_fusion":true,"enable_noop":true}
2 (
none_fusion_attention
) and 3 (none_fusion_attention_allreduce
) are newly possible with this PR. On main, results are similar except those two are worse as fusion cannot happen without custom ops enabled.redhatai/meta-llama-3.1-70B-Instruct-FP8 (TP=1)
:Past QPS=10 the server is overloaded so the latency spikes and becomes much more variable. Also note that allreduce fusion is a noop for tp=1.
📊 TTFT Median (ms)
📊 TPOT Median (ms)
📊 ITL Median (ms)
redhatai/meta-llama-3.1-70B-Instruct-FP8 (TP=4)
:Note that allreduce fusion reduces TPOT at low QP but increases it at high QPS and increases TTFT across the board, this will be addressed in #24248 and #24252.
📊 TTFT Median (ms)
📊 TPOT Median (ms)
📊 ITL Median (ms)