Skip to content

Conversation

yibinl-nvidia
Copy link
Collaborator

@yibinl-nvidia yibinl-nvidia commented May 23, 2025

Use safe deserialization in ParallelConfig

Description

Remove unsafe pickle.load usage in ParallelConfig by using a safe deserialization method. During the tests, I realized that the current location of serialization.py under executor directory will cause a circular import of the files that import ParallelConfig, so I move the serialization.py outside of executor directory.

Test Coverage

Added a new test test_parallel_config_serialization to check the correctness of safe serialization / deserialization of ParallelConfig.

@yibinl-nvidia
Copy link
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6334 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6334 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4624 completed with status: 'FAILURE'

@yibinl-nvidia yibinl-nvidia force-pushed the dev-yibinl-remove-pickle-from-ParallelConfig-v2 branch from 7e2e768 to deaac01 Compare May 24, 2025 03:42
@yibinl-nvidia
Copy link
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6350 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6350 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4637 completed with status: 'FAILURE'

@yibinl-nvidia yibinl-nvidia force-pushed the dev-yibinl-remove-pickle-from-ParallelConfig-v2 branch from deaac01 to e0bf861 Compare May 26, 2025 00:54
@yibinl-nvidia
Copy link
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6397 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6397 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4677 completed with status: 'FAILURE'

@yibinl-nvidia
Copy link
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6501 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6501 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4757 completed with status: 'SUCCESS'

@yibinl-nvidia
Copy link
Collaborator Author

@kaiyux Who do you think would be the best reviewer for this PR? I checked contribution history to ParallelConfig and your name pops up :)

@kaiyux kaiyux requested a review from yuxianq May 29, 2025 06:37
@kaiyux
Copy link
Member

kaiyux commented May 29, 2025

@yuxianq Since it's related to auto parallel, can you help take a look?

@yibinl-nvidia yibinl-nvidia force-pushed the dev-yibinl-remove-pickle-from-ParallelConfig-v2 branch from e0bf861 to 6aaf52b Compare June 2, 2025 22:35
@yibinl-nvidia
Copy link
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7243 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7243 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5245 completed with status: 'SUCCESS'

@yibinl-nvidia yibinl-nvidia force-pushed the dev-yibinl-remove-pickle-from-ParallelConfig-v2 branch from 6aaf52b to 6fad474 Compare June 5, 2025 19:24
@yibinl-nvidia
Copy link
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8981 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8981 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6556 completed with status: 'FAILURE'

@yibinl-nvidia yibinl-nvidia force-pushed the dev-yibinl-remove-pickle-from-ParallelConfig-v2 branch from 2f1ab6b to e522c01 Compare June 16, 2025 14:50
@yibinl-nvidia
Copy link
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9043 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9043 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6610 completed with status: 'FAILURE'

@yibinl-nvidia yibinl-nvidia force-pushed the dev-yibinl-remove-pickle-from-ParallelConfig-v2 branch from e522c01 to d0caf72 Compare June 17, 2025 15:34
@yibinl-nvidia
Copy link
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9224 [ run ] triggered by Bot

@yibinl-nvidia
Copy link
Collaborator Author

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9258 [ kill ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9258 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit d0caf72

@yibinl-nvidia
Copy link
Collaborator Author

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9261 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9261 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6795 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@yibinl-nvidia
Copy link
Collaborator Author

yibinl-nvidia commented Jun 18, 2025

@kaiyux @Superjomn This MR relocates serialization.py to a better location and apply safe deserialization to ParallelConfig. I would expect very few updates to the BASE_AUTOPP_CLASSES (unlike IPC) so we should be safe to do this change. Could you review and merge this PR? Thanks.

@yuxianq yuxianq merged commit 0f3bd78 into NVIDIA:main Jun 27, 2025
3 checks passed
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants