-
Notifications
You must be signed in to change notification settings - Fork 1.8k
chore: Mass integration of release/0.20. #4732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
) * Restore per-channel pre-quant Signed-off-by: Barry Kang <[email protected]> * Update TRT test script Signed-off-by: Barry Kang <[email protected]> * Fix pre-commit Signed-off-by: Barry Kang <[email protected]> --------- Signed-off-by: Barry Kang <[email protected]>
Signed-off-by: Ivy Zhang <[email protected]>
Signed-off-by: Yiqing Yan <[email protected]>
…e memory and log more memory information (NVIDIA#4660) Signed-off-by: Hui Gao <[email protected]>
Signed-off-by: nv-guomingz <[email protected]>
…d weight loading in fused moe. (NVIDIA#4699) Signed-off-by: Yuxian Qiu <[email protected]>
Signed-off-by: Balaram Buddharaju <[email protected]>
/bot run |
PR_Github #6768 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for my part.
PR_Github #6768 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please remember updating the internal commit ID before merge this PR.
# expert_idx is the local slot index of current rank | ||
expert_idx = local_slot_id | ||
max_workers = min( | ||
(self.expert_end - self.expert_start) * 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use self.expert_size_per_partition
instead of (self.expert_end - self.expert_start)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hlu1 My PR #4699 may conflict with your PR #4790 in this mass integration.
@amirkl94 If Hao's PR merged first, we should cherry-pick this change to tensorrt_llm/_torch/modules/fused_moe/quantization.py
in https://github.com/NVIDIA/TensorRT-LLM/pull/4790/files#diff-19b05de4a4dd136814f3e04d4ed51c2e4f2389c7b0b2a6bca49195150ebadd66R87 instead.
Mass integration of release/0.20 to main.