Skip to content

Conversation

@lowsfer
Copy link
Member

@lowsfer lowsfer commented Apr 3, 2025

Before this change, speculative decoding is supported with Ampere XQA on Hopper. This change make Hopper XQA also support speculative decoding for better perf on Hopper GPUs.

@lowsfer lowsfer requested a review from ming-wei April 3, 2025 10:31
@lowsfer lowsfer force-pushed the user/yaoy/hopper-xqa-spec-dec branch from a577b6f to cb709e3 Compare April 3, 2025 10:36
@juney-nvidia juney-nvidia changed the title Support speculative decoding with Hopper XQA feat: Support speculative decoding with Hopper XQA Apr 3, 2025
@lowsfer lowsfer force-pushed the user/yaoy/hopper-xqa-spec-dec branch from cb709e3 to 247079b Compare April 7, 2025 01:32
@lowsfer lowsfer force-pushed the user/yaoy/hopper-xqa-spec-dec branch from 247079b to 27b48bb Compare April 7, 2025 06:33
@lowsfer
Copy link
Member Author

lowsfer commented Apr 7, 2025

/bot run --disable-fail-fast

@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Apr 7, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Apr 7, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #1273 [ run ] triggered by Bot

@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Apr 7, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Apr 7, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #1273 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #959 completed with status: 'SUCCESS'

@lowsfer lowsfer enabled auto-merge (squash) April 7, 2025 08:37
@lowsfer lowsfer disabled auto-merge April 7, 2025 08:37
@lowsfer lowsfer enabled auto-merge (squash) April 7, 2025 08:38
@lowsfer lowsfer force-pushed the user/yaoy/hopper-xqa-spec-dec branch from 27b48bb to 755f1be Compare April 7, 2025 08:38
@lowsfer
Copy link
Member Author

lowsfer commented Apr 7, 2025

/bot reuse-pipeline --comment "rebase"

@NVIDIA NVIDIA deleted a comment from github-actions bot Apr 7, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #1301 [ reuse-pipeline ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #1301 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #1273 for commit 755f1be

@lowsfer lowsfer force-pushed the user/yaoy/hopper-xqa-spec-dec branch from 755f1be to db38872 Compare April 7, 2025 08:59
@lowsfer
Copy link
Member Author

lowsfer commented Apr 7, 2025

/bot reuse-pipeline --comment "rebase"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #1308 [ reuse-pipeline ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #1308 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #1273 for commit db38872

@lowsfer lowsfer merged commit 3545d59 into NVIDIA:main Apr 7, 2025
2 checks passed
sarattha pushed a commit to sarattha/TensorRT-LLM that referenced this pull request Apr 9, 2025
wu1du2 pushed a commit to wu1du2/TensorRT-LLM that referenced this pull request May 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants