-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: Support speculative decoding with Hopper XQA #3269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a577b6f to
cb709e3
Compare
cb709e3 to
247079b
Compare
247079b to
27b48bb
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #1273 [ run ] triggered by Bot |
|
PR_Github #1273 [ run ] completed with state |
27b48bb to
755f1be
Compare
|
/bot reuse-pipeline --comment "rebase" |
|
PR_Github #1301 [ reuse-pipeline ] triggered by Bot |
|
PR_Github #1301 [ reuse-pipeline ] completed with state |
Signed-off-by: Yao Yao <[email protected]>
755f1be to
db38872
Compare
|
/bot reuse-pipeline --comment "rebase" |
|
PR_Github #1308 [ reuse-pipeline ] triggered by Bot |
|
PR_Github #1308 [ reuse-pipeline ] completed with state |
Signed-off-by: Yao Yao <[email protected]> Signed-off-by: sarattha <[email protected]>
Signed-off-by: Yao Yao <[email protected]>
Before this change, speculative decoding is supported with Ampere XQA on Hopper. This change make Hopper XQA also support speculative decoding for better perf on Hopper GPUs.