fix: skip weights defined in create_weights for pp. #4447

yuxianq · 2025-05-19T14:15:58Z

Since DecoderModel's __pp_init__ is called before DecoderModelForCausalLM's __post_init__, it fails to skip weights for those weights defined in create_weights, which is created inside __post_init__.
We call DecoderModel's __pp_init__ inside DecoderModelForCausalLM's __pp_init__ to fix it, since DecoderModelForCausalLM's __pp_init__ is called after its __post_init__.

Signed-off-by: Yuxian Qiu <[email protected]>

yuxianq · 2025-05-19T14:17:33Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2025-05-19T14:23:33Z

PR_Github #5739 [ run ] triggered by Bot

Barry-Delaney

LGTM. Local tests passed.

amukkara

does any CI test fail before this change?

tensorrt-cicd · 2025-05-20T01:43:30Z

PR_Github #5739 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4195 completed with status: 'FAILURE'

yuxianq · 2025-05-20T03:03:03Z

does any CI test fail before this change?

@amukkara No, @Barry-Delaney get OOM issue when running python examples/pytorch/quickstart_advanced.py --model_dir /llm-models/DeepSeek-R1/DeepSeek-R1-W4AFP8 --tp_size 2 --pp_size 2 --moe_ep_size 1 --moe_tp_size 2 on H200x4. After this PR, this test can pass. Our CI does not contain any DeepSeek-R1 test now.

yuxianq · 2025-05-20T03:33:14Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2025-05-20T03:38:35Z

PR_Github #5812 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-20T11:18:39Z

PR_Github #5812 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4257 completed with status: 'FAILURE'

yuxianq · 2025-05-20T11:43:24Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2025-05-20T11:48:54Z

PR_Github #5871 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-21T01:31:15Z

PR_Github #5871 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4302 completed with status: 'SUCCESS'

Signed-off-by: Yuxian Qiu <[email protected]>

fix: skip weights defined in create_weights for pp. (#4447) Signed-off-by: Yuxian Qiu <[email protected]>

fix: skip weights defined in create_weights for pp.

f2ad7c9

Signed-off-by: Yuxian Qiu <[email protected]>

yuxianq requested review from hlu1, Barry-Delaney and amukkara May 19, 2025 14:15

Barry-Delaney approved these changes May 19, 2025

View reviewed changes

amukkara approved these changes May 19, 2025

View reviewed changes

amukkara reviewed May 19, 2025

View reviewed changes

Merge branch 'main' into fix-skip-weights

b24af21

Barry-Delaney merged commit 62c16b6 into NVIDIA:main May 21, 2025
3 checks passed

yuxianq added a commit to yuxianq/TensorRT-LLM that referenced this pull request May 21, 2025

fix: skip weights defined in create_weights for pp. (NVIDIA#4447)

1ad7fec

Signed-off-by: Yuxian Qiu <[email protected]>

yuxianq added a commit that referenced this pull request May 21, 2025

Cherry pick #4447 (#4517)

f8bd372

fix: skip weights defined in create_weights for pp. (#4447) Signed-off-by: Yuxian Qiu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: skip weights defined in create_weights for pp. #4447

fix: skip weights defined in create_weights for pp. #4447

Uh oh!

yuxianq commented May 19, 2025

Uh oh!

yuxianq commented May 19, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

Barry-Delaney left a comment

Uh oh!

amukkara left a comment •

edited

Loading

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

yuxianq commented May 20, 2025

Uh oh!

yuxianq commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

yuxianq commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

fix: skip weights defined in create_weights for pp. #4447

fix: skip weights defined in create_weights for pp. #4447

Uh oh!

Conversation

yuxianq commented May 19, 2025

Uh oh!

yuxianq commented May 19, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

Barry-Delaney left a comment

Choose a reason for hiding this comment

Uh oh!

amukkara left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

yuxianq commented May 20, 2025

Uh oh!

yuxianq commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

yuxianq commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

amukkara left a comment •

edited

Loading