Skip to content

Conversation

shaharmor98
Copy link
Collaborator

@shaharmor98 shaharmor98 commented May 28, 2025

Mass integration of release/0.20 to main.

Barry-Delaney and others added 7 commits May 28, 2025 09:08
)

* Restore per-channel pre-quant

Signed-off-by: Barry Kang <[email protected]>

* Update TRT test script

Signed-off-by: Barry Kang <[email protected]>

* Fix pre-commit

Signed-off-by: Barry Kang <[email protected]>

---------

Signed-off-by: Barry Kang <[email protected]>
…e memory and log more memory information (NVIDIA#4660)

Signed-off-by: Hui Gao <[email protected]>
@shaharmor98 shaharmor98 requested review from a team as code owners May 28, 2025 11:13
@shaharmor98 shaharmor98 requested review from juney-nvidia and hyukn May 28, 2025 11:13
@shaharmor98
Copy link
Collaborator Author

/bot run

@amirkl94 amirkl94 requested a review from Barry-Delaney May 28, 2025 11:19
@tensorrt-cicd
Copy link
Collaborator

PR_Github #6768 [ run ] triggered by Bot

@shaharmor98 shaharmor98 changed the title Release 0.20 to main chore: Mass integration of release/0.20. May 28, 2025
Copy link
Collaborator

@yuxianq yuxianq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my part.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6768 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4933 completed with status: 'FAILURE'

Copy link
Collaborator

@Barry-Delaney Barry-Delaney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please remember updating the internal commit ID before merge this PR.

# expert_idx is the local slot index of current rank
expert_idx = local_slot_id
max_workers = min(
(self.expert_end - self.expert_start) * 2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuxianq the pipeline fails as self.expert_end was removed in a PR in main, how do you suggest to solve this?
#4495

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use self.expert_size_per_partition instead of (self.expert_end - self.expert_start)

Copy link
Collaborator

@yuxianq yuxianq May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hlu1 My PR #4699 may conflict with your PR #4790 in this mass integration.
@amirkl94 If Hao's PR merged first, we should cherry-pick this change to tensorrt_llm/_torch/modules/fused_moe/quantization.py in https://github.com/NVIDIA/TensorRT-LLM/pull/4790/files#diff-19b05de4a4dd136814f3e04d4ed51c2e4f2389c7b0b2a6bca49195150ebadd66R87 instead.

@shaharmor98 shaharmor98 closed this Aug 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants