Skip to content

Conversation

@nv-yilinf
Copy link
Collaborator

@nv-yilinf nv-yilinf commented May 28, 2025

PR title

Cherry pick feat/llama4 to main

Description

This PR is based on @chenfeiz0326 's check-pick PR with new merge conflicts resolved. I made a new PR because I don't have push permission to his branch

Perf

===========================================================
= PERFORMANCE OVERVIEW
===========================================================
Request Throughput (req/sec):                     0.7471
Total Output Throughput (tokens/sec):             897.0175
Total Token Throughput (tokens/sec):              1679.6140
Total Latency (ms):                               42833.0554
Average request latency (ms):                     1338.4794
Per User Output Throughput [w/ ctx] (tps/user):   900.4043
Per GPU Output Throughput (tps/gpu):              112.1272
Average time-to-first-token [TTFT] (ms):          52.7782
Average time-per-output-token [TPOT] (ms):        1.0700
Per User Output Speed (tps/user):                 936.9597

-- Acceptance Rate Details --------------------------------

[AR] MINIMUM: 2.63
[AR] MAXIMUM: 3.18
[AR] AVERAGE: 2.99
[AR] P50    : 3.04
[AR] P90    : 3.15
[AR] P95    : 3.17
[AR] P99    : 3.18
===========================================================

Accuracy verification

MMLU weighted average accuracy: 85.91 (14042)

GPQA:

{'score:std': np.float64(0.4969039949999533), 'score:stderr': 0.03540294377095367, 'score': np.float64(0.5555555555555556), 'task_name': 'gpqa_diamond'}

Output tokens:

[0] Prompt: 'Hello, my name is', Generated text: ' {{ name }} and I am {{ age }} years old.\n    </body>\n</html>\n"""\n\n# Define the data to be used in the template\ndata = {\n    "name": "John Doe",\n    "age": 30\n}\n\n# Render the template with the data\nrendered_template'
[1] Prompt: 'The president of the United States is', Generated text: ' the head of state and head of government of the United States, and is also the commander-in-chief of the armed forces. The president is responsible for executing the laws and policies of the government, and is also responsible for representing the United States on the international stage. The president is elected by the people'
[2] Prompt: 'The capital of France is', Generated text: ' Paris, and it is known for its rich history, art, and culture. The city is home to many famous landmarks, including the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Paris is also known for its fashion, cuisine, and romantic atmosphere, making it a popular destination'
[3] Prompt: 'The future of AI is', Generated text: ' a topic of much debate and speculation. As we continue to develop and refine AI technologies, what can we expect the future of AI to hold, and how will it impact various aspects of our lives and society as a whole?\n\n1. **Advancements in AI Capabilities**: The future of AI is expected to'

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@nv-yilinf nv-yilinf requested review from a team as code owners May 28, 2025 18:09
@nv-yilinf nv-yilinf requested review from lfr-0531 and suyoggupta May 28, 2025 18:09
@nv-yilinf
Copy link
Collaborator Author

/bot run --stage-list A100X-PyTorch-1

@nv-yilinf nv-yilinf requested a review from hlu1 May 28, 2025 18:15
@nv-yilinf
Copy link
Collaborator Author

/bot run --stage-list A100X-PyTorch-1

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6800 [ run ] triggered by Bot

@nv-yilinf
Copy link
Collaborator Author

/bot kill

@hlu1
Copy link
Collaborator

hlu1 commented May 28, 2025

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6801 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6800 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6802 [ kill ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6801 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6802 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit f02b8a2

@nv-yilinf nv-yilinf force-pushed the cherry-pick-feat-llama4-to-main branch 3 times, most recently from 83ac7de to ba27e1d Compare May 28, 2025 19:54
@nv-yilinf
Copy link
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@nv-yilinf nv-yilinf force-pushed the cherry-pick-feat-llama4-to-main branch from ba27e1d to 75eda41 Compare May 28, 2025 20:00
@nv-yilinf
Copy link
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@nv-yilinf nv-yilinf force-pushed the cherry-pick-feat-llama4-to-main branch from 75eda41 to 09dd6a4 Compare May 28, 2025 21:06
@nv-yilinf
Copy link
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6806 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6806 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4960 completed with status: 'FAILURE'

@nv-yilinf
Copy link
Collaborator Author

/bot run --add-multi-gpu-test

@nv-yilinf
Copy link
Collaborator Author

/bot run --stage-list A10-TensorRT-1--add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6821 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6822 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6916 [ run ] triggered by Bot

@nvpohanh
Copy link
Collaborator

https://prod.blsm.nvidia.com/sw-tensorrt-top-1/blue/organizations/jenkins/LLM%2Fmain%2FL0_Test-x86_64/detail/L0_Test-x86_64/12086/pipeline/242

4 out of 5 multi-gpu stages passed. and the failing one is caused by infra issue

@nvpohanh
Copy link
Collaborator

/bot run --only-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6916 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5021 (Partly Tested) completed with status: 'FAILURE'

@nv-yilinf
Copy link
Collaborator Author

/bot run --only-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6936 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6936 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5036 (Partly Tested) completed with status: 'FAILURE'

@nv-yilinf
Copy link
Collaborator Author

/bot run --only-multi-gpu-test --disable-fail-fast

1 similar comment
@nv-yilinf
Copy link
Collaborator Author

/bot run --only-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6950 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6950 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5046 (Partly Tested) completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@hlu1
Copy link
Collaborator

hlu1 commented May 29, 2025

/bot skip --comment "Combined CI passes"

chenfeiz0326 and others added 4 commits May 29, 2025 14:15
Signed-off-by: Chenfei Zhang <[email protected]>
Signed-off-by: Chenfei Zhang <[email protected]>
Signed-off-by: Yilin Fan <[email protected]>
@hlu1 hlu1 force-pushed the cherry-pick-feat-llama4-to-main branch from 28cca3d to e344f8d Compare May 29, 2025 21:15
@hlu1 hlu1 enabled auto-merge (squash) May 29, 2025 21:15
@tensorrt-cicd
Copy link
Collaborator

PR_Github #6957 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6957 [ skip ] completed with state SUCCESS
Skipping testing for commit e344f8d

@hlu1 hlu1 merged commit 31bb650 into NVIDIA:main May 29, 2025
3 checks passed
@hlu1
Copy link
Collaborator

hlu1 commented May 29, 2025

/bot run --post-merge

@hlu1
Copy link
Collaborator

hlu1 commented May 29, 2025

/bot run --post-merge --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6959 [ ] completed with state FAILURE
Not allowed on merged PR

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6960 [ ] completed with state FAILURE
Not allowed on merged PR

SimengLiu-nv added a commit to SimengLiu-nv/TensorRT-LLM that referenced this pull request May 29, 2025
@SimengLiu-nv
Copy link
Collaborator

Created #4780 as a dup to trigger post_merge pipelines.

darraghdog pushed a commit to darraghdog/TensorRT-LLM that referenced this pull request Jun 3, 2025
Signed-off-by: Chenfei Zhang <[email protected]>
Signed-off-by: Yilin Fan <[email protected]>
Co-authored-by: Chenfei Zhang <[email protected]>
Signed-off-by: darraghdog <[email protected]>
@nv-yilinf nv-yilinf deleted the cherry-pick-feat-llama4-to-main branch September 4, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants