Skip to content

Conversation

mikeiovine
Copy link
Collaborator

@mikeiovine mikeiovine commented May 30, 2025

Description

The 8k context limitation was accidentally re-added during feat/llama4 integration.

Since the Blackwell kernels have not landed yet, I've added an additional get_sm_version() == 90 restriction for using the full context length.

Test Coverage

RULER evaluation passes

|  Tasks  |Version|Filter|n-shot|Metric|   | Value |   |Stderr|                                                                                                                                             
|---------|------:|------|-----:|-----:|---|------:|---|------|                                                                                                                                             
|ruler_cwe|      1|none  |     0| 16384|↑  | 0.9808|±  |   N/A|                                                                                                                                             
|         |       |none  |     0|  4096|↑  |-1.0000|±  |   N/A|     

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@mikeiovine mikeiovine requested a review from a team as a code owner May 30, 2025 19:22
@mikeiovine mikeiovine requested review from HuiGao-NV and lucaslie May 30, 2025 19:22
@hlu1
Copy link
Collaborator

hlu1 commented May 30, 2025

Is there any test coverage for llama4 long context?

@mikeiovine
Copy link
Collaborator Author

Is there any test coverage for llama4 long context?

The issue is that the release version of lm_eval does not have RULER yet, you need to install it from source. We need to add a very long prompt to the sanity check (test_modeling_llama4). However, the sanity check is currently disabled due to an unrelated multimodality issue. The multimodal tests need to be separated from the text one to unblock this.

@hlu1
Copy link
Collaborator

hlu1 commented May 30, 2025

Can you add test to check that the return value of infer_max_seq_len is correct on SM90/100? It's a functional test, not a real accuracy test.

@mikeiovine
Copy link
Collaborator Author

Decided to enable a functional llama 4 long context test in L0 instead so we don't have to put up another PR. The broken multimodal tests remain disabled for now.

@mikeiovine
Copy link
Collaborator Author

/bot run --add-multi-gpu-test

@mikeiovine
Copy link
Collaborator Author

/bot run

@mikeiovine
Copy link
Collaborator Author

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7225 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7226 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7225 [ run ] completed with state ABORTED

@mikeiovine
Copy link
Collaborator Author

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7236 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7226 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7236 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5241 completed with status: 'SUCCESS'

@mikeiovine
Copy link
Collaborator Author

/bot run --post-merge

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7359 [ run ] triggered by Bot

@schetlur-nv
Copy link
Collaborator

/bot --help

Copy link

github-actions bot commented Jun 3, 2025

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7359 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5335 completed with status: 'FAILURE'

@mikeiovine
Copy link
Collaborator Author

/bot run --stage-list DGX_H200-8_GPUs-PyTorch-[Post-Merge]-1

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7393 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7393 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5364 (Partly Tested) completed with status: 'FAILURE'

@mikeiovine mikeiovine changed the title [fix] Fix llama 4 long context on Hopper [fix] Fix llama 4 long context Jun 3, 2025
@mikeiovine
Copy link
Collaborator Author

/bot skip --comment "Passed before rebase."

@mikeiovine
Copy link
Collaborator Author

We have to keep the newly added post-merge test disabled for now. It's timing out. Landing to unblock and will fix in followup.

@mikeiovine mikeiovine enabled auto-merge (squash) June 3, 2025 23:19
@tensorrt-cicd
Copy link
Collaborator

PR_Github #7398 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7398 [ skip ] completed with state SUCCESS
Release Check Pipeline #1101 failed

@mikeiovine
Copy link
Collaborator Author

/bot skip --comment "Passed before rebase."

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7400 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7400 [ skip ] completed with state SUCCESS
Skipping testing for commit 19bcc60

@mikeiovine mikeiovine merged commit 73389d6 into NVIDIA:main Jun 3, 2025
3 checks passed
@mikeiovine mikeiovine deleted the fix-long-ctx branch July 23, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants