test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) #4407

venkywonka · 2025-05-16T23:33:55Z

Expand PyT `llama_v3.1_nemotron_nano_8b` perf tests coverage

Description

This PR adds end-to-end performance results for the llama_v3.1_nemotron_nano_8b bfloat16 engine on 1 H100.
Two broad load patterns were evaluated on PyT backend for various ISL/OSL combos:

Low concurrency: concurrency = 1, requests = 8
High concurrency: concurrency = 250, requests = 500

All tests use max_batch_size = 512.

Performance Summary

Concurrency	Input Len	Output Len	#Reqs	Req Throughput (req/s)	Per GPU Output TPS (tps/gpu)	Avg Latency (ms)
1	500	2000	8	0.0629	125.79	15 898.9
1	1000	1000	8	0.1660	166.00	6 023.7
1	5000	500	8	0.2971	148.54	3 365.91
1	20000	2000	8	0.0639	127.72	15 659.59
250	5000	500	500	2.7919	1 395.94	77 524.8
250	500	2000	500	3.2334	6 466.84	67 673.7
250	1000	1000	500	6.0589	6 058.94	40 414.9
250	20000	2000	500	0.2835	566.96	686 971.0

NOTE: the above numbers were generated with prefill chunking disabled (which is the default behavior)

Copilot

Pull Request Overview

This PR expands end-to-end performance test coverage for the llama_v3.1_nemotron_nano_8b engine on the PyTorch backend, evaluating low and high concurrency patterns across various input/output lengths.

Adds 8 new pytest entries under a new “torch backend” section for both low (concurrency=1, requests=8) and high (concurrency=250, requests=500) loads.
Removes two outdated PyTorch backend tests using default input lengths.
Ensures max batch size is set to 512 in all new scenarios.

Comments suppressed due to low confidence (1)

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml:26

[nitpick] Consider renaming the section comment '# torch backend' to '# pytorch backend' for consistency and clarity in labeling.

# torch backend

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml

venkywonka · 2025-05-22T13:36:59Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-22T13:42:25Z

PR_Github #6156 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-22T17:34:46Z

PR_Github #6156 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4501 completed with status: 'SUCCESS'

Signed-off-by: Venky <[email protected]>

This is because the test harness default to no prefill chunking, that means the isl specified is the true context. When explicitly unspecified in the test harness, the `maxnt` passed down to `trtllm-bench` is 2048. This means trtllm-bench gets conflicting inputs when isl>2048 but maxnt=2048; hence overriding maxnt to be consistent with isl for such cases. Signed-off-by: Venky <[email protected]>

…(pyt) (NVIDIA#4407) * extend pyt nano tests perf coverage Signed-off-by: Venky <[email protected]> * explicitly set maxnt for some cases This is because the test harness default to no prefill chunking, that means the isl specified is the true context. When explicitly unspecified in the test harness, the `maxnt` passed down to `trtllm-bench` is 2048. This means trtllm-bench gets conflicting inputs when isl>2048 but maxnt=2048; hence overriding maxnt to be consistent with isl for such cases. Signed-off-by: Venky <[email protected]> --------- Signed-off-by: Venky <[email protected]>

venkywonka mentioned this pull request May 19, 2025

test: remove enable_overlap_schedule in pytorch config and set enable_chunked prefill to be true for isl>2048 cases #4285

Merged

venkywonka marked this pull request as ready for review May 19, 2025 13:09

venkywonka requested review from Copilot, ruodil, tijyojwad, schetlur-nv and LarryXFly May 19, 2025 13:09

Copilot AI reviewed May 19, 2025

View reviewed changes

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml Outdated Show resolved Hide resolved

tijyojwad approved these changes May 19, 2025

View reviewed changes

ruodil approved these changes May 20, 2025

View reviewed changes

LarryXFly approved these changes May 21, 2025

View reviewed changes

venkywonka force-pushed the user/venky/ll-nemo-nano-pyt-perf-ext branch from a94c959 to e8b3fee Compare May 22, 2025 13:36

venkywonka changed the base branch from main to release/0.20 May 22, 2025 19:54

venkywonka requested review from a team as code owners May 22, 2025 19:54

venkywonka changed the base branch from release/0.20 to main May 22, 2025 19:55

venkywonka added 2 commits May 22, 2025 12:58

extend pyt nano tests perf coverage

da3a2b2

Signed-off-by: Venky <[email protected]>

venkywonka force-pushed the user/venky/ll-nemo-nano-pyt-perf-ext branch from e8b3fee to 67260d3 Compare May 22, 2025 19:59

venkywonka requested a review from a team as a code owner May 22, 2025 19:59

venkywonka requested review from dcampora and litaotju May 22, 2025 19:59

venkywonka changed the base branch from main to release/0.20 May 22, 2025 19:59

LarryXFly approved these changes May 23, 2025

View reviewed changes

LarryXFly merged commit d15ceae into NVIDIA:release/0.20 May 23, 2025
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) #4407

test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) #4407

Uh oh!

venkywonka commented May 16, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

venkywonka commented May 22, 2025

Uh oh!

tensorrt-cicd commented May 22, 2025

Uh oh!

tensorrt-cicd commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) #4407

test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) #4407

Uh oh!

Conversation

venkywonka commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Expand PyT llama_v3.1_nemotron_nano_8b perf tests coverage

Description

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

venkywonka commented May 22, 2025

Uh oh!

tensorrt-cicd commented May 22, 2025

Uh oh!

tensorrt-cicd commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

venkywonka commented May 16, 2025 •

edited

Loading

Expand PyT `llama_v3.1_nemotron_nano_8b` perf tests coverage