[TRTLLM-4971]: Use safe deserialization in ParallelConfig #4630

yibinl-nvidia · 2025-05-23T20:46:32Z

Use safe deserialization in ParallelConfig

Description

Remove unsafe pickle.load usage in ParallelConfig by using a safe deserialization method. During the tests, I realized that the current location of serialization.py under executor directory will cause a circular import of the files that import ParallelConfig, so I move the serialization.py outside of executor directory.

Test Coverage

Added a new test test_parallel_config_serialization to check the correctness of safe serialization / deserialization of ParallelConfig.

yibinl-nvidia · 2025-05-23T20:47:26Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-05-23T20:52:34Z

PR_Github #6334 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-23T22:21:40Z

PR_Github #6334 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4624 completed with status: 'FAILURE'

yibinl-nvidia · 2025-05-24T03:42:46Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-05-24T03:47:57Z

PR_Github #6350 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-24T11:38:55Z

PR_Github #6350 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4637 completed with status: 'FAILURE'

yibinl-nvidia · 2025-05-26T00:54:16Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-05-26T00:59:24Z

PR_Github #6397 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-26T07:04:29Z

PR_Github #6397 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4677 completed with status: 'FAILURE'

yibinl-nvidia · 2025-05-26T18:44:55Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-05-26T18:50:53Z

PR_Github #6501 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-26T20:54:16Z

PR_Github #6501 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4757 completed with status: 'SUCCESS'

yibinl-nvidia · 2025-05-28T05:32:48Z

@kaiyux Who do you think would be the best reviewer for this PR? I checked contribution history to ParallelConfig and your name pops up :)

kaiyux · 2025-05-29T06:38:34Z

@yuxianq Since it's related to auto parallel, can you help take a look?

tensorrt_llm/executor/ipc.py

tensorrt_llm/serialization.py

yibinl-nvidia · 2025-06-02T22:36:02Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-06-02T22:42:04Z

PR_Github #7243 [ run ] triggered by Bot

tensorrt_llm/auto_parallel/parallelization.py

tensorrt-cicd · 2025-06-03T14:37:30Z

PR_Github #7243 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5245 completed with status: 'SUCCESS'

yibinl-nvidia · 2025-06-05T19:25:24Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-06-16T05:57:18Z

PR_Github #8981 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-16T07:32:50Z

PR_Github #8981 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6556 completed with status: 'FAILURE'

yibinl-nvidia · 2025-06-16T14:51:11Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-06-16T14:56:38Z

PR_Github #9043 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-16T19:35:08Z

PR_Github #9043 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6610 completed with status: 'FAILURE'

Signed-off-by: Yibin Li <[email protected]>

yibinl-nvidia · 2025-06-17T15:35:15Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-06-17T15:40:34Z

PR_Github #9224 [ run ] triggered by Bot

yibinl-nvidia · 2025-06-18T00:48:41Z

/bot kill

tensorrt-cicd · 2025-06-18T00:55:14Z

PR_Github #9258 [ kill ] triggered by Bot

tensorrt-cicd · 2025-06-18T00:55:15Z

PR_Github #9258 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit d0caf72

yibinl-nvidia · 2025-06-18T01:05:13Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-06-18T01:11:34Z

PR_Github #9261 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-18T07:07:22Z

PR_Github #9261 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6795 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

yibinl-nvidia · 2025-06-18T15:33:08Z

@kaiyux @Superjomn This MR relocates serialization.py to a better location and apply safe deserialization to ParallelConfig. I would expect very few updates to the BASE_AUTOPP_CLASSES (unlike IPC) so we should be safe to do this change. Could you review and merge this PR? Thanks.

Signed-off-by: Yibin Li <[email protected]>

yibinl-nvidia self-assigned this May 23, 2025

yibinl-nvidia requested a review from kaiyux May 23, 2025 20:47

yibinl-nvidia mentioned this pull request May 23, 2025

[Draft] [TRTLLM-4971]: Use safe deserialization in ParallelConfig #4274

Closed

yibinl-nvidia force-pushed the dev-yibinl-remove-pickle-from-ParallelConfig-v2 branch from 7e2e768 to deaac01 Compare May 24, 2025 03:42

yibinl-nvidia force-pushed the dev-yibinl-remove-pickle-from-ParallelConfig-v2 branch from deaac01 to e0bf861 Compare May 26, 2025 00:54

kaiyux requested a review from yuxianq May 29, 2025 06:37

yuxianq reviewed May 29, 2025

View reviewed changes

tensorrt_llm/executor/ipc.py Outdated Show resolved Hide resolved

yuxianq reviewed May 29, 2025

View reviewed changes

tensorrt_llm/serialization.py Outdated Show resolved Hide resolved

yibinl-nvidia force-pushed the dev-yibinl-remove-pickle-from-ParallelConfig-v2 branch from e0bf861 to 6aaf52b Compare June 2, 2025 22:35

yuxianq reviewed Jun 3, 2025

View reviewed changes

tensorrt_llm/auto_parallel/parallelization.py Outdated Show resolved Hide resolved

yuxianq approved these changes Jun 3, 2025

View reviewed changes

yibinl-nvidia force-pushed the dev-yibinl-remove-pickle-from-ParallelConfig-v2 branch from 6aaf52b to 6fad474 Compare June 5, 2025 19:24

yibinl-nvidia force-pushed the dev-yibinl-remove-pickle-from-ParallelConfig-v2 branch from 2f1ab6b to e522c01 Compare June 16, 2025 14:50

yibinl-nvidia added 3 commits June 17, 2025 15:34

initial change and new tests

8750e6c

Signed-off-by: Yibin Li <[email protected]>

relocate serialization.py to avoid circular import

62272b9

Signed-off-by: Yibin Li <[email protected]>

adress comments

d0caf72

Signed-off-by: Yibin Li <[email protected]>

yibinl-nvidia force-pushed the dev-yibinl-remove-pickle-from-ParallelConfig-v2 branch from e522c01 to d0caf72 Compare June 17, 2025 15:34

yibinl-nvidia requested review from Superjomn June 18, 2025 16:59

yuxianq merged commit 0f3bd78 into NVIDIA:main Jun 27, 2025
3 checks passed

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025

[TRTLLM-4971]: Use safe deserialization in ParallelConfig (NVIDIA#4630)

984c75d

Signed-off-by: Yibin Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-4971]: Use safe deserialization in ParallelConfig (NVIDIA#4630)

946d333

Signed-off-by: Yibin Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-4971]: Use safe deserialization in ParallelConfig (NVIDIA#4630)

bc23316

Signed-off-by: Yibin Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-4971]: Use safe deserialization in ParallelConfig (NVIDIA#4630)

dd553d7

Signed-off-by: Yibin Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-4971]: Use safe deserialization in ParallelConfig (NVIDIA#4630)

a109c46

Signed-off-by: Yibin Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[TRTLLM-4971]: Use safe deserialization in ParallelConfig (NVIDIA#4630)

6ada41a

Signed-off-by: Yibin Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[TRTLLM-4971]: Use safe deserialization in ParallelConfig (NVIDIA#4630)

1386ea4

Signed-off-by: Yibin Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[TRTLLM-4971]: Use safe deserialization in ParallelConfig (NVIDIA#4630)

4bcb0f9

Signed-off-by: Yibin Li <[email protected]>

[TRTLLM-4971]: Use safe deserialization in ParallelConfig #4630

[TRTLLM-4971]: Use safe deserialization in ParallelConfig #4630

Uh oh!

Conversation

yibinl-nvidia commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Use safe deserialization in ParallelConfig

Description

Test Coverage

Uh oh!

yibinl-nvidia commented May 23, 2025

Uh oh!

tensorrt-cicd commented May 23, 2025

Uh oh!

tensorrt-cicd commented May 23, 2025

Uh oh!

yibinl-nvidia commented May 24, 2025

Uh oh!

tensorrt-cicd commented May 24, 2025

Uh oh!

tensorrt-cicd commented May 24, 2025

Uh oh!

yibinl-nvidia commented May 26, 2025

Uh oh!

tensorrt-cicd commented May 26, 2025

Uh oh!

tensorrt-cicd commented May 26, 2025

Uh oh!

yibinl-nvidia commented May 26, 2025

Uh oh!

tensorrt-cicd commented May 26, 2025

Uh oh!

tensorrt-cicd commented May 26, 2025

Uh oh!

yibinl-nvidia commented May 28, 2025

Uh oh!

kaiyux commented May 29, 2025

Uh oh!

Uh oh!

Uh oh!

yibinl-nvidia commented Jun 2, 2025

Uh oh!

tensorrt-cicd commented Jun 2, 2025

Uh oh!

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

yibinl-nvidia commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 16, 2025

Uh oh!

tensorrt-cicd commented Jun 16, 2025

Uh oh!

yibinl-nvidia commented Jun 16, 2025

Uh oh!

tensorrt-cicd commented Jun 16, 2025

Uh oh!

tensorrt-cicd commented Jun 16, 2025

Uh oh!

yibinl-nvidia commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

yibinl-nvidia commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

yibinl-nvidia commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

yibinl-nvidia commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yibinl-nvidia commented May 23, 2025 •

edited

Loading

yibinl-nvidia commented Jun 18, 2025 •

edited

Loading