-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-5589] feat: Minor optimizations for tunable FP8 batched GEMM op. #5139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
26bb347
to
2d0ac26
Compare
/bot run |
PR_Github #8578 [ run ] triggered by Bot |
2c17b89
to
f52d7d0
Compare
/bot run |
PR_Github #8600 [ run ] triggered by Bot |
PR_Github #8600 [ run ] completed with state |
2769352
to
fc0c839
Compare
/bot run |
PR_Github #8724 [ run ] triggered by Bot |
PR_Github #8724 [ run ] completed with state |
fc0c839
to
d0e3b33
Compare
/bot run |
PR_Github #8950 [ run ] triggered by Bot |
PR_Github #8950 [ run ] completed with state |
d0e3b33
to
0894c7e
Compare
/bot run |
PR_Github #8995 [ run ] triggered by Bot |
PR_Github #8995 [ run ] completed with state |
/bot run |
PR_Github #9071 [ run ] triggered by Bot |
PR_Github #9071 [ run ] completed with state |
/bot run |
PR_Github #9083 [ run ] triggered by Bot |
PR_Github #9083 [ run ] completed with state |
e3258ad
to
5e301d1
Compare
/bot run |
PR_Github #9116 [ run ] triggered by Bot |
PR_Github #9116 [ run ] completed with state |
5e301d1
to
c7510d0
Compare
/bot run |
PR_Github #9183 [ run ] triggered by Bot |
/bot run |
PR_Github #9197 [ run ] triggered by Bot |
PR_Github #9183 [ run ] completed with state |
/bot run |
PR_Github #9233 [ run ] triggered by Bot |
PR_Github #9233 [ run ] completed with state |
/bot run |
PR_Github #9242 [ run ] triggered by Bot |
PR_Github #9242 [ run ] completed with state |
Signed-off-by: Yukun He <[email protected]>
c7510d0
to
7f869dd
Compare
/bot run |
PR_Github #9263 [ run ] triggered by Bot |
PR_Github #9263 [ run ] completed with state |
Revise some implementations in the tunable FP8 batched GEMM op to reduce Python overhead for the original #4872.
The class TuningConfig is hashable so that the same contents will be converted to the same hash value. This will not cause any cache key miss. But it can introduce extra overheads for the inference phase.