fix: [nvbug5300494] Use runtime total gpu memory to calculate kv cache memory and log more memory information #4660

HuiGao-NV · 2025-05-26T08:51:06Z

Use runtime total gpu memory to calculate kv cache memory and log more memory information.
This can avoid mis-reporting of less kv memory.

HuiGao-NV · 2025-05-26T08:51:50Z

/bot run

HuiGao-NV · 2025-05-26T10:36:11Z

/bot run

tensorrt-cicd · 2025-05-26T10:41:50Z

PR_Github #6467 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-26T12:13:54Z

PR_Github #6467 [ run ] completed with state SUCCESS
/LLM/release-0.20/L0_MergeRequest_PR pipeline #70 completed with status: 'FAILURE'

HuiGao-NV · 2025-05-26T23:16:05Z

/bot run

tensorrt-cicd · 2025-05-26T23:21:32Z

PR_Github #6510 [ run ] triggered by Bot

HuiGao-NV · 2025-05-27T00:36:04Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-27T00:41:23Z

PR_Github #6514 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-27T00:41:27Z

PR_Github #6510 [ run ] completed with state ABORTED
/LLM/release-0.20/L0_MergeRequest_PR pipeline #74 completed with status: 'FAILURE'

tensorrt-cicd · 2025-05-27T04:59:18Z

PR_Github #6514 [ run ] completed with state SUCCESS
/LLM/release-0.20/L0_MergeRequest_PR pipeline #75 completed with status: 'FAILURE'

HuiGao-NV · 2025-05-27T07:07:32Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-27T07:13:05Z

PR_Github #6586 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-27T12:38:22Z

PR_Github #6586 [ run ] completed with state SUCCESS
/LLM/release-0.20/L0_MergeRequest_PR pipeline #82 completed with status: 'FAILURE'

HuiGao-NV · 2025-05-27T14:21:23Z

/bot run --stage-list="H100_PCIe-PyTorch-3"

tensorrt-cicd · 2025-05-27T14:26:44Z

PR_Github #6651 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-27T15:57:15Z

PR_Github #6651 [ run ] completed with state SUCCESS
/LLM/release-0.20/L0_MergeRequest_PR pipeline #87 (Partly Tested) completed with status: 'SUCCESS'

Change method to compute peak memory Set new peak memory for case test_ptq_quickstart_advanced_mtp Get non-torch memory of starttime of kv memory estimation Signed-off-by: Hui Gao <[email protected]>

HuiGao-NV · 2025-05-28T01:43:43Z

/bot skip --comment="CI has passed."

tensorrt-cicd · 2025-05-28T01:49:30Z

PR_Github #6687 [ skip ] triggered by Bot

tensorrt-cicd · 2025-05-28T02:00:17Z

PR_Github #6687 [ skip ] completed with state SUCCESS
Skipping testing for commit d97ee2e

…e memory and log more memory information (NVIDIA#4660) Signed-off-by: Hui Gao <[email protected]>

HuiGao-NV requested review from a team as code owners May 26, 2025 08:51

HuiGao-NV requested review from MartinMarciniszyn, QiJune and shaharmor98 May 26, 2025 08:51

HuiGao-NV changed the base branch from main to release/0.20 May 26, 2025 08:51

HuiGao-NV requested a review from a team as a code owner May 26, 2025 08:51

HuiGao-NV requested a review from litaotju May 26, 2025 08:51

HuiGao-NV changed the title ~~Use runtime total gpu memory to calculate kv cache memory and log more memory information~~ fix: [nvbug5300494] Use runtime total gpu memory to calculate kv cache memory and log more memory information May 27, 2025

litaotju approved these changes May 27, 2025

View reviewed changes

HuiGao-NV force-pushed the oom_testing branch from 63b41c3 to 8c6a693 Compare May 27, 2025 07:06

HuiGao-NV enabled auto-merge (squash) May 27, 2025 23:41

HuiGao-NV force-pushed the oom_testing branch from f68f11d to 50a69f4 Compare May 27, 2025 23:53

Log more memory information

4903629

Change method to compute peak memory Set new peak memory for case test_ptq_quickstart_advanced_mtp Get non-torch memory of starttime of kv memory estimation Signed-off-by: Hui Gao <[email protected]>

HuiGao-NV force-pushed the oom_testing branch from 6182c58 to 4903629 Compare May 28, 2025 01:40

Merge branch 'release/0.20' into oom_testing

d97ee2e

HuiGao-NV merged commit 1bfc7d4 into NVIDIA:release/0.20 May 28, 2025
3 checks passed

shaharmor98 pushed a commit to shaharmor98/tekit that referenced this pull request May 28, 2025

fix: [nvbug5300494] Use runtime total gpu memory to calculate kv cach…

138dc0c

…e memory and log more memory information (NVIDIA#4660) Signed-off-by: Hui Gao <[email protected]>

HuiGao-NV deleted the oom_testing branch June 3, 2025 00:58

omera-nv pushed a commit to omera-nv/TensorRT-LLM that referenced this pull request Jun 7, 2025

fix: [nvbug5300494] Use runtime total gpu memory to calculate kv cach…

de1474d

…e memory and log more memory information (NVIDIA#4660) Signed-off-by: Hui Gao <[email protected]>

fix: [nvbug5300494] Use runtime total gpu memory to calculate kv cache memory and log more memory information #4660

fix: [nvbug5300494] Use runtime total gpu memory to calculate kv cache memory and log more memory information #4660

Uh oh!

Conversation

HuiGao-NV commented May 26, 2025

Uh oh!

HuiGao-NV commented May 26, 2025

Uh oh!

HuiGao-NV commented May 26, 2025

Uh oh!

tensorrt-cicd commented May 26, 2025

Uh oh!

tensorrt-cicd commented May 26, 2025

Uh oh!

HuiGao-NV commented May 26, 2025

Uh oh!

tensorrt-cicd commented May 26, 2025

Uh oh!

HuiGao-NV commented May 27, 2025

Uh oh!

tensorrt-cicd commented May 27, 2025

Uh oh!

tensorrt-cicd commented May 27, 2025

Uh oh!

tensorrt-cicd commented May 27, 2025

Uh oh!

HuiGao-NV commented May 27, 2025

Uh oh!

tensorrt-cicd commented May 27, 2025

Uh oh!

tensorrt-cicd commented May 27, 2025

Uh oh!

HuiGao-NV commented May 27, 2025

Uh oh!

tensorrt-cicd commented May 27, 2025

Uh oh!

tensorrt-cicd commented May 27, 2025

Uh oh!

HuiGao-NV commented May 28, 2025

Uh oh!

tensorrt-cicd commented May 28, 2025

Uh oh!

tensorrt-cicd commented May 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants