-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix: [nvbugs/5066257] serialization improvments #3869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for making this PR! Several general comments:
|
|
@kaiyux This is the "approved list approach" I discussed with you offline. Let me know your thoughts. |
90a7915 to
743fa7b
Compare
|
@yibinl-nvidia I fixed the DCO and ran the pre-commit hooks, added the tests, and made some improvements based on the comments. let me know if there is anything else I should address. |
|
/bot run |
|
PR_Github #3654 [ run ] triggered by Bot |
|
PR_Github #3654 [ run ] completed with state |
|
I did a micro-benchmark comparing the custom serialization and deserialization against Pickle. The first result is based on a simple GenerationResult object (same one as in the |
|
The last commit should resolve the issue reported by blossom. |
|
/bot run |
|
PR_Github #3744 [ run ] triggered by Bot |
|
PR_Github #3744 [ run ] completed with state |
9653b63 to
fa5a912
Compare
|
/bot run --add-multi-gpu-test --disable-fail-fast |
|
PR_Github #3766 [ run ] triggered by Bot |
|
the tests that were failed in the basic bot run should be fixed with those new commits. |
|
PR_Github #3766 [ run ] completed with state |
b422fec to
dbbce8c
Compare
|
/bot run --add-multi-gpu-test --disable-fail-fast |
|
PR_Github #3873 [ run ] triggered by Bot |
|
PR_Github #3873 [ run ] completed with state |
|
PR_Github #6102 [ run ] triggered by Bot |
|
PR_Github #6102 [ run ] completed with state |
|
/bot run --add-multi-gpu-test --disable-fail-fast |
|
PR_Github #6123 [ run ] triggered by Bot |
|
PR_Github #6123 [ run ] completed with state |
… function. Signed-off-by: [email protected] <[email protected]>
…ction because it didn't work for all objects that made debugging harder, added tests. Signed-off-by: [email protected] <[email protected]>
…e function. Also added missing classes to approved list. Signed-off-by: coldwaterq <[email protected]>
Signed-off-by: coldwaterq <[email protected]>
Signed-off-by: Yibin Li <[email protected]>
Signed-off-by: Yibin Li <[email protected]>
Signed-off-by: Yibin Li <[email protected]>
Signed-off-by: Yibin Li <[email protected]>
…spawned child processes Signed-off-by: coldwaterq <[email protected]>
Signed-off-by: Yibin Li <[email protected]>
Signed-off-by: Yibin Li <[email protected]>
Signed-off-by: Yibin Li <[email protected]>
567c059 to
9b9df25
Compare
|
/bot reuse-pipeline |
|
PR_Github #6217 [ reuse-pipeline ] triggered by Bot |
|
PR_Github #6217 [ reuse-pipeline ] completed with state |
* added a restricted pcikler and depickler in a sepparate serialization function. Signed-off-by: [email protected] <[email protected]> * updated IPC to remove approved classes, removed the serialization function because it didn't work for all objects that made debugging harder, added tests. Signed-off-by: [email protected] <[email protected]> * removed LLM arg and moved class registration to a serialization module function. Also added missing classes to approved list. Signed-off-by: coldwaterq <[email protected]> * cleaned up a couple files to reduce conflicts with main. Signed-off-by: coldwaterq <[email protected]> * fix unit tests Signed-off-by: Yibin Li <[email protected]> * reorder BASE_ZMQ_CLASSES list alphabetically Signed-off-by: Yibin Li <[email protected]> * fix tests and move LogitsProcessor registration to base class Signed-off-by: Yibin Li <[email protected]> * revert changes to import log of tensorrt_llm._torch.models Signed-off-by: Yibin Li <[email protected]> * added comments to explain why BASE_ZMQ_CLASSES has to be passed into spawned child processes Signed-off-by: coldwaterq <[email protected]> * fix tests and move LogitsProcessor registration to base class Signed-off-by: Yibin Li <[email protected]> * additional comments for multiprocess approved list sync Signed-off-by: Yibin Li <[email protected]> * add dataclass from tests Signed-off-by: Yibin Li <[email protected]> --------- Signed-off-by: [email protected] <[email protected]> Signed-off-by: coldwaterq <[email protected]> Signed-off-by: Yibin Li <[email protected]> Co-authored-by: Yibin Li <[email protected]>
* added a restricted pcikler and depickler in a sepparate serialization function. Signed-off-by: [email protected] <[email protected]> * updated IPC to remove approved classes, removed the serialization function because it didn't work for all objects that made debugging harder, added tests. Signed-off-by: [email protected] <[email protected]> * removed LLM arg and moved class registration to a serialization module function. Also added missing classes to approved list. Signed-off-by: coldwaterq <[email protected]> * cleaned up a couple files to reduce conflicts with main. Signed-off-by: coldwaterq <[email protected]> * fix unit tests Signed-off-by: Yibin Li <[email protected]> * reorder BASE_ZMQ_CLASSES list alphabetically Signed-off-by: Yibin Li <[email protected]> * fix tests and move LogitsProcessor registration to base class Signed-off-by: Yibin Li <[email protected]> * revert changes to import log of tensorrt_llm._torch.models Signed-off-by: Yibin Li <[email protected]> * added comments to explain why BASE_ZMQ_CLASSES has to be passed into spawned child processes Signed-off-by: coldwaterq <[email protected]> * fix tests and move LogitsProcessor registration to base class Signed-off-by: Yibin Li <[email protected]> * additional comments for multiprocess approved list sync Signed-off-by: Yibin Li <[email protected]> * add dataclass from tests Signed-off-by: Yibin Li <[email protected]> --------- Signed-off-by: [email protected] <[email protected]> Signed-off-by: coldwaterq <[email protected]> Signed-off-by: Yibin Li <[email protected]> Co-authored-by: Yibin Li <[email protected]> Signed-off-by: darraghdog <[email protected]>
Description
Add an approve list into the pickle deserialization process to reduce the attack surface of using pickle to a subset of supported objects.
Test Coverage
if a new object is added this PR will cause a ValueError to be raised that clearly states what module is not supported that is being serialized/deserialized.
Fully utilizing the zmq interfaces would test this code.