Skip to content

Conversation

@moraxu
Copy link
Collaborator

@moraxu moraxu commented May 15, 2025

Description

Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant.
CLI accuracy tests are still needed because NIMs use TRT-LLM's TRT backend for now.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@moraxu
Copy link
Collaborator Author

moraxu commented May 15, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5387 [ run ] triggered by Bot

@moraxu
Copy link
Collaborator Author

moraxu commented May 15, 2025

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5390 [ kill ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5387 [ run ] completed with state ABORTED

@moraxu
Copy link
Collaborator Author

moraxu commented May 15, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5390 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 952f6dc

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5392 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5392 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3935 completed with status: 'FAILURE'

@moraxu
Copy link
Collaborator Author

moraxu commented May 15, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5409 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5409 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3946 completed with status: 'FAILURE'

@moraxu
Copy link
Collaborator Author

moraxu commented May 16, 2025

PR_Github #5409 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3946 completed with status: 'FAILURE'

I don't see any failures in that test? https://prod.blsm.nvidia.com/sw-tensorrt-top-1/job/LLM/job/main/job/L0_MergeRequest_PR/3946/testReport/

@moraxu
Copy link
Collaborator Author

moraxu commented May 16, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5543 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5543 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4044 completed with status: 'SUCCESS'

@moraxu moraxu force-pushed the add-cli-acc-tests-Llama3-3-70BInstruct branch from 967c1d0 to 48ce5e2 Compare May 19, 2025 07:12
@moraxu moraxu requested a review from chang-l May 19, 2025 16:07
@moraxu
Copy link
Collaborator Author

moraxu commented May 19, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5765 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5765 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4219 completed with status: 'SUCCESS'

moraxu added 5 commits May 19, 2025 16:05
Signed-off-by: moraxu <[email protected]>
Signed-off-by: moraxu <[email protected]>
Signed-off-by: moraxu <[email protected]>
Signed-off-by: moraxu <[email protected]>
@moraxu moraxu force-pushed the add-cli-acc-tests-Llama3-3-70BInstruct branch from 48ce5e2 to 0b35b84 Compare May 19, 2025 23:05
@chzblych
Copy link
Collaborator

/bot reuse-pipeline

@chzblych chzblych enabled auto-merge (squash) May 20, 2025 01:36
@tensorrt-cicd
Copy link
Collaborator

PR_Github #5794 [ reuse-pipeline ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5794 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #5765 for commit 0b35b84

@chzblych chzblych merged commit 0a342a4 into NVIDIA:main May 20, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants