fix: Skip rope scaling for local layers in Gemma3 VLM #5773

brb-nv · 2025-07-07T00:09:53Z

Description

Gemma3 VLMs (5B, 12B, 27B) have rope scaling only for global layers and not local layers. This MR applies this fix. Also, adds changes to support q_scaling and sliding_window in FlashInfer backend.

The fix significantly improves quality of VLM output.

$ python3 examples/pytorch/quickstart_multimodal.py --model_dir ../random/hf_models/gemma-3-4b-it/ --modality image --prompt "Describe this image in detail." --media "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg" --image_format pil

BEFORE: "Here's a detailed description of the image description of the image:\n\nHere's of the image:\n\nThis is a close-up shot of a vibrant and colorful garden scene featuring a cluster of pink cosmos flowers in bloom with a bee. The image of cosmos flowers and a bee. \n\n**Composition:"

AFTER: "Here's a detailed description of the image:\n\n**Overall Impression:**\n\nThe image is a close-up shot of a vibrant garden scene, focusing on a cluster of pink cosmos flowers and a busy bee. It has a soft, slightly blurred background, drawing the viewer's attention to the central subjects.\n\n"

Consolidates previous Gemma3 1B test with new Gemma3 27B VLM test.

Test Coverage

$ pytest tests/unittest/_torch/modeling/test_modeling_gemma3.py::TestGemma3::test_gemma3_allclose_to_hf[backend:flashinfer_config:27b] -s -v
$ pytest tests/unittest/_torch/modeling/test_modeling_gemma3.py::TestGemma3::test_gemma3_allclose_to_hf[backend:trtllm_config:27b] -s -v
$ pytest tests/unittest/_torch/modeling/test_modeling_gemma3.py::TestGemma3::test_gemma3_allclose_to_hf[backend:vanilla_config:27b] -s -v
$ pytest tests/unittest/_torch/modeling/test_modeling_gemma3.py::TestGemma3::test_gemma3_allclose_to_hf[backend:flashinfer_config:1b] -s -v
$ pytest tests/unittest/_torch/modeling/test_modeling_gemma3.py::TestGemma3::test_gemma3_allclose_to_hf[backend:trtllm_config:1b] -s -v
$ pytest tests/unittest/_torch/modeling/test_modeling_gemma3.py::TestGemma3::test_gemma3_allclose_to_hf[backend:vanilla_config:1b] -s -v

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

brb-nv · 2025-07-07T00:24:59Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-07T00:30:36Z

PR_Github #11072 [ run ] triggered by Bot

Signed-off-by: Balaram Buddharaju <[email protected]>

brb-nv · 2025-07-07T00:31:23Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-07T00:37:04Z

PR_Github #11073 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-07T00:37:06Z

PR_Github #11072 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-07-07T03:03:09Z

PR_Github #11073 [ run ] completed with state SUCCESS
/LLM/release-0.21/L0_MergeRequest_PR pipeline #173 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

byshiue

LGTM

brb-nv requested review from a team as code owners July 7, 2025 00:09

brb-nv requested review from symphonylyh and nv-yilinf July 7, 2025 00:09

brb-nv force-pushed the user/brb/fix-gemma3-vlm-rope branch 2 times, most recently from 2199fa9 to d706367 Compare July 7, 2025 00:23

fix: Skip rope scaling for local layers in Gemma3 VLM

5191130

Signed-off-by: Balaram Buddharaju <[email protected]>

brb-nv force-pushed the user/brb/fix-gemma3-vlm-rope branch from d706367 to 5191130 Compare July 7, 2025 00:31

byshiue approved these changes Jul 7, 2025

View reviewed changes

byshiue merged commit 9106b5d into NVIDIA:release/0.21 Jul 7, 2025
3 checks passed

This was referenced Jul 8, 2025

The output of Gemma 3 4B for TensorRT and Transformers is not the same, even when using float32 #4815

Open

fix: Skip rope scaling for local layers in Gemma3 VLM #5857

Merged

brb-nv deleted the user/brb/fix-gemma3-vlm-rope branch July 11, 2025 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Skip rope scaling for local layers in Gemma3 VLM #5773

fix: Skip rope scaling for local layers in Gemma3 VLM #5773

Uh oh!

brb-nv commented Jul 7, 2025 •

edited

Loading

Uh oh!

brb-nv commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

brb-nv commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

byshiue left a comment

Uh oh!

Uh oh!

Uh oh!

fix: Skip rope scaling for local layers in Gemma3 VLM #5773

fix: Skip rope scaling for local layers in Gemma3 VLM #5773

Uh oh!

Conversation

brb-nv commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

brb-nv commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

brb-nv commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

byshiue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

brb-nv commented Jul 7, 2025 •

edited

Loading