Cherry pick feat/llama4 to main #4739

nv-yilinf · 2025-05-28T18:09:24Z

PR title

Cherry pick feat/llama4 to main

Description

This PR is based on @chenfeiz0326 's check-pick PR with new merge conflicts resolved. I made a new PR because I don't have push permission to his branch

Perf

===========================================================
= PERFORMANCE OVERVIEW
===========================================================
Request Throughput (req/sec):                     0.7471
Total Output Throughput (tokens/sec):             897.0175
Total Token Throughput (tokens/sec):              1679.6140
Total Latency (ms):                               42833.0554
Average request latency (ms):                     1338.4794
Per User Output Throughput [w/ ctx] (tps/user):   900.4043
Per GPU Output Throughput (tps/gpu):              112.1272
Average time-to-first-token [TTFT] (ms):          52.7782
Average time-per-output-token [TPOT] (ms):        1.0700
Per User Output Speed (tps/user):                 936.9597

-- Acceptance Rate Details --------------------------------

[AR] MINIMUM: 2.63
[AR] MAXIMUM: 3.18
[AR] AVERAGE: 2.99
[AR] P50    : 3.04
[AR] P90    : 3.15
[AR] P95    : 3.17
[AR] P99    : 3.18
===========================================================

Accuracy verification

MMLU weighted average accuracy: 85.91 (14042)

GPQA:

{'score:std': np.float64(0.4969039949999533), 'score:stderr': 0.03540294377095367, 'score': np.float64(0.5555555555555556), 'task_name': 'gpqa_diamond'}

Output tokens:

[0] Prompt: 'Hello, my name is', Generated text: ' {{ name }} and I am {{ age }} years old.\n    </body>\n</html>\n"""\n\n# Define the data to be used in the template\ndata = {\n    "name": "John Doe",\n    "age": 30\n}\n\n# Render the template with the data\nrendered_template'
[1] Prompt: 'The president of the United States is', Generated text: ' the head of state and head of government of the United States, and is also the commander-in-chief of the armed forces. The president is responsible for executing the laws and policies of the government, and is also responsible for representing the United States on the international stage. The president is elected by the people'
[2] Prompt: 'The capital of France is', Generated text: ' Paris, and it is known for its rich history, art, and culture. The city is home to many famous landmarks, including the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Paris is also known for its fashion, cuisine, and romantic atmosphere, making it a popular destination'
[3] Prompt: 'The future of AI is', Generated text: ' a topic of much debate and speculation. As we continue to develop and refine AI technologies, what can we expect the future of AI to hold, and how will it impact various aspects of our lives and society as a whole?\n\n1. **Advancements in AI Capabilities**: The future of AI is expected to'

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

nv-yilinf · 2025-05-28T18:10:01Z

/bot run --stage-list A100X-PyTorch-1

nv-yilinf · 2025-05-28T18:30:22Z

/bot run --stage-list A100X-PyTorch-1

tensorrt-cicd · 2025-05-28T18:35:53Z

PR_Github #6800 [ run ] triggered by Bot

nv-yilinf · 2025-05-28T18:37:48Z

/bot kill

hlu1 · 2025-05-28T18:38:22Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2025-05-28T18:44:14Z

PR_Github #6801 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-28T18:44:16Z

PR_Github #6800 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-05-28T18:44:33Z

PR_Github #6802 [ kill ] triggered by Bot

tensorrt-cicd · 2025-05-28T18:44:34Z

PR_Github #6801 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-05-28T18:45:04Z

PR_Github #6802 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit f02b8a2

nv-yilinf · 2025-05-28T19:54:58Z

/bot run --disable-fail-fast --add-multi-gpu-test

nv-yilinf · 2025-05-28T20:19:58Z

/bot run --disable-fail-fast --add-multi-gpu-test

nv-yilinf · 2025-05-28T21:07:35Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2025-05-28T21:13:45Z

PR_Github #6806 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-29T00:21:27Z

PR_Github #6806 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4960 completed with status: 'FAILURE'

nv-yilinf · 2025-05-29T00:25:06Z

/bot run --add-multi-gpu-test

nv-yilinf · 2025-05-29T00:29:30Z

/bot run --stage-list A10-TensorRT-1--add-multi-gpu-test

tensorrt-cicd · 2025-05-29T00:31:21Z

PR_Github #6821 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-29T00:35:10Z

PR_Github #6822 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-29T08:00:45Z

PR_Github #6916 [ run ] triggered by Bot

nvpohanh · 2025-05-29T10:38:52Z

https://prod.blsm.nvidia.com/sw-tensorrt-top-1/blue/organizations/jenkins/LLM%2Fmain%2FL0_Test-x86_64/detail/L0_Test-x86_64/12086/pipeline/242

4 out of 5 multi-gpu stages passed. and the failing one is caused by infra issue

nvpohanh · 2025-05-29T10:39:15Z

/bot run --only-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-05-29T10:43:52Z

PR_Github #6916 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5021 (Partly Tested) completed with status: 'FAILURE'

nv-yilinf · 2025-05-29T16:12:32Z

/bot run --only-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-05-29T16:18:00Z

PR_Github #6936 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-29T16:44:09Z

PR_Github #6936 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5036 (Partly Tested) completed with status: 'FAILURE'

nv-yilinf · 2025-05-29T18:53:33Z

/bot run --only-multi-gpu-test --disable-fail-fast

nv-yilinf · 2025-05-29T18:56:39Z

/bot run --only-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-05-29T19:02:43Z

PR_Github #6950 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-29T21:08:23Z

PR_Github #6950 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5046 (Partly Tested) completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

hlu1 · 2025-05-29T21:15:24Z

/bot skip --comment "Combined CI passes"

Signed-off-by: Chenfei Zhang <[email protected]>

Signed-off-by: Yilin Fan <[email protected]>

tensorrt-cicd · 2025-05-29T21:20:54Z

PR_Github #6957 [ skip ] triggered by Bot

tensorrt-cicd · 2025-05-29T21:28:39Z

PR_Github #6957 [ skip ] completed with state SUCCESS
Skipping testing for commit e344f8d

hlu1 · 2025-05-29T21:31:45Z

/bot run --post-merge

hlu1 · 2025-05-29T21:32:07Z

/bot run --post-merge --disable-fail-fast

tensorrt-cicd · 2025-05-29T21:37:06Z

PR_Github #6959 [ ] completed with state FAILURE
Not allowed on merged PR

tensorrt-cicd · 2025-05-29T21:37:15Z

PR_Github #6960 [ ] completed with state FAILURE
Not allowed on merged PR

Signed-off-by: Simeng Liu <[email protected]>

SimengLiu-nv · 2025-05-29T23:49:09Z

Created #4780 as a dup to trigger post_merge pipelines.

Signed-off-by: Chenfei Zhang <[email protected]> Signed-off-by: Yilin Fan <[email protected]> Co-authored-by: Chenfei Zhang <[email protected]> Signed-off-by: darraghdog <[email protected]>

nv-yilinf requested review from a team as code owners May 28, 2025 18:09

nv-yilinf requested review from lfr-0531 and suyoggupta May 28, 2025 18:09

nv-yilinf requested a review from hlu1 May 28, 2025 18:15

hlu1 approved these changes May 28, 2025

View reviewed changes

nv-yilinf force-pushed the cherry-pick-feat-llama4-to-main branch 3 times, most recently from 83ac7de to ba27e1d Compare May 28, 2025 19:54

nv-yilinf force-pushed the cherry-pick-feat-llama4-to-main branch from ba27e1d to 75eda41 Compare May 28, 2025 20:00

nv-yilinf force-pushed the cherry-pick-feat-llama4-to-main branch from 75eda41 to 09dd6a4 Compare May 28, 2025 21:06

chenfeiz0326 and others added 4 commits May 29, 2025 14:15

Cherry-pick feat/llama4's changes

cf1e2a6

Signed-off-by: Chenfei Zhang <[email protected]>

Update formatting

ef995cc

Signed-off-by: Chenfei Zhang <[email protected]>

Fix llama lora bad accuracy

d96cc6d

Signed-off-by: Chenfei Zhang <[email protected]>

Fix merge conflict

e344f8d

Signed-off-by: Yilin Fan <[email protected]>

hlu1 force-pushed the cherry-pick-feat-llama4-to-main branch from 28cca3d to e344f8d Compare May 29, 2025 21:15

hlu1 enabled auto-merge (squash) May 29, 2025 21:15

hlu1 merged commit 31bb650 into NVIDIA:main May 29, 2025
3 checks passed

SimengLiu-nv added a commit to SimengLiu-nv/TensorRT-LLM that referenced this pull request May 29, 2025

Draft: Dup of NVIDIA#4739 to run post-merge pipeline

9fe6fe2

Signed-off-by: Simeng Liu <[email protected]>

mikeiovine mentioned this pull request May 30, 2025

[nvbug/5280806][fix] Fix 2 model spec decode flow #4807

Merged

nv-yilinf deleted the cherry-pick-feat-llama4-to-main branch September 4, 2025 16:38

Cherry pick feat/llama4 to main #4739

Cherry pick feat/llama4 to main #4739

Uh oh!

Conversation

nv-yilinf commented May 28, 2025 • edited by hlu1 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR title

Description

Perf

Accuracy verification

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

nv-yilinf commented May 28, 2025

Uh oh!

nv-yilinf commented May 28, 2025

Uh oh!

tensorrt-cicd commented May 28, 2025

Uh oh!

nv-yilinf commented May 28, 2025

Uh oh!

hlu1 commented May 28, 2025

Uh oh!

tensorrt-cicd commented May 28, 2025

Uh oh!

tensorrt-cicd commented May 28, 2025

Uh oh!

tensorrt-cicd commented May 28, 2025

Uh oh!

tensorrt-cicd commented May 28, 2025

Uh oh!

tensorrt-cicd commented May 28, 2025

Uh oh!

nv-yilinf commented May 28, 2025

Uh oh!

nv-yilinf commented May 28, 2025

Uh oh!

nv-yilinf commented May 28, 2025

Uh oh!

tensorrt-cicd commented May 28, 2025

Uh oh!

tensorrt-cicd commented May 29, 2025

Uh oh!

nv-yilinf commented May 29, 2025

Uh oh!

nv-yilinf commented May 29, 2025

Uh oh!

tensorrt-cicd commented May 29, 2025

Uh oh!

tensorrt-cicd commented May 29, 2025

Uh oh!

tensorrt-cicd commented May 29, 2025

Uh oh!

nvpohanh commented May 29, 2025

Uh oh!

nvpohanh commented May 29, 2025

Uh oh!

tensorrt-cicd commented May 29, 2025

Uh oh!

nv-yilinf commented May 29, 2025

Uh oh!

tensorrt-cicd commented May 29, 2025

Uh oh!

tensorrt-cicd commented May 29, 2025

Uh oh!

nv-yilinf commented May 29, 2025

Uh oh!

nv-yilinf commented May 29, 2025

Uh oh!

tensorrt-cicd commented May 29, 2025

Uh oh!

tensorrt-cicd commented May 29, 2025

Uh oh!

hlu1 commented May 29, 2025

Uh oh!

tensorrt-cicd commented May 29, 2025

Uh oh!

nv-yilinf commented May 28, 2025 •

edited by hlu1

Loading