-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[#5861][autodeploy] Refactor: Quantization Transforms with Inheritance #7227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Fridah-nv
merged 25 commits into
NVIDIA:main
from
nv-auto-deploy:user/fridah/inherit-quant2
Sep 10, 2025
+2,303
−875
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
03d1966
add torch ref impl for FP8, add unit test
Fridah-nv dcd68df
add torch ref impl for FP4, add op map unit test
Fridah-nv f38101e
split linear and bmm quantization
Fridah-nv 3932877
update quantize_linear_from_config to point to the custom op
Fridah-nv a19eeeb
separate custom op into two torch ops
Fridah-nv 7c2c0d1
quantized fusion transforms, WIP for FP4
Fridah-nv d5cfe13
add QuantizationFusionMixin class
Fridah-nv 437b942
quantized sharding class for FP8 and FP4
Fridah-nv 5c3d7b4
remove QuantizationImpl in sharding, remove unused methods in Quantiz…
Fridah-nv aaa28d9
remove custom_quant_linear op
Fridah-nv 3733366
rename custom quant ops
Fridah-nv e477434
WIP to map custom quant op to real implementation using pattern matcher
Fridah-nv 7f8a8f8
fix unit tests
Fridah-nv 10a4994
remove unused ENUM
Fridah-nv 24885ea
minor updates: rabbit feedback, docstrings, code cleaning
Fridah-nv c8b21f9
clear unit tests on blackwell; address a few comments; rename FP ops …
Fridah-nv 961c33f
remove include_quantization from is_linear_node
Fridah-nv 19500d7
address few comments: remove pattern matcher fake mode patch; remove …
Fridah-nv d01af1e
update quantization transforms:Linear, BMM, MoE and MoE matching into…
Fridah-nv 7d011b6
remove QuantizationImpl class; remove more reference of is_quantized_op
Fridah-nv f57fa57
fix test_quantization_utils.py
Fridah-nv 75063f1
minor: address comments, uncomment skipped unit tests
Fridah-nv d9afed8
skip quant gemm fusion for perf
Fridah-nv 30cef34
Merge branch 'main' into user/fridah/inherit-quant2
Fridah-nv 5178d0f
Merge branch 'main' into user/fridah/inherit-quant2
Fridah-nv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.