test(perf): Add some `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (TRT flow, trtllm-bench) #4128

venkywonka · 2025-05-07T16:39:26Z

Description

Add some Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp backend, trtllm-bench).
This also exposes a --trust_remote_code flag in the trtllm-bench-build subcommand, that is required for transformers library to use Autoclasses to load DeciLM-based models (Llama-Nemotron-Super being one of them).
This PR also changes config.py and model.py for the DeciLMForCausalLM classes to have trust_remote_code=True by default (it was False by default previously) for thing to work smoothly without extra parametrizations when run from top-level trtllm-bench.

Performance Summary – `llama_v3.3_nemotron_super_49b`

isl	osl	quant	con	backend	req/s	tps /gpu	avg latency ms	p50 latency ms
5000	500	none	1	cpp	0.1075	13.4317	9 306.1785	9 305.0552
5000	500	fp8	1	cpp	0.1485	18.5636	6 733.4385	6 730.6310
5000	500	none	250	cpp	0.6116	76.4499	317 885.8769	401 171.5739
5000	500	fp8	250	cpp	0.7220	90.2495	269 376.7776	340 154.1910
500	2000	none	1	cpp	0.0304	15.2075	32 878.3526	32 877.1050
500	2000	fp8	1	cpp	0.0435	21.7563	22 981.6188	22 975.2227
500	2000	none	250	cpp	0.3274	163.7098	589 062.8547	733 682.4485
500	2000	fp8	250	cpp	0.4158	207.8830	463 903.2804	577 812.6816

Run Invariants

Model: llama_v3.3_nemotron_super_49b
Backend: cpp (builds TensorRT engines)
Precision: BF16 baseline, FP8 quantized variants
Max batch size: 16 • GPUs: 4 (per-GPU throughput shown above)
Benchmark tool: trtllm-bench
Synthetic dataset: 512 sequences per run

Execution Status Matrix

backend	isl	osl	quant	con	status
cpp	5000	500	none	1	TIMEOUT
cpp	5000	500	fp8	1	TIMEOUT
cpp	5000	500	none	250	PASS
cpp	5000	500	fp8	250	PASS
cpp	500	2000	none	1	TIMEOUT
cpp	500	2000	fp8	1	TIMEOUT
cpp	500	2000	none	250	PASS
cpp	500	2000	fp8	250	PASS

venkywonka · 2025-05-07T16:51:35Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-07T16:59:52Z

PR_Github #4410 [ run ] triggered by Bot

venkywonka · 2025-05-07T19:29:11Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-07T19:35:00Z

PR_Github #4421 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-07T19:35:02Z

PR_Github #4410 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-05-07T22:28:39Z

PR_Github #4421 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3183 completed with status: 'FAILURE'

venkywonka · 2025-05-08T04:07:30Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-08T04:13:16Z

PR_Github #4472 [ run ] triggered by Bot

tests/integration/defs/perf/test_perf.py

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml

tensorrt-cicd · 2025-05-08T10:06:19Z

PR_Github #4472 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3209 completed with status: 'FAILURE'

venkywonka · 2025-05-08T14:43:22Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-08T14:49:12Z

PR_Github #4584 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-08T21:35:22Z

PR_Github #4584 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3298 completed with status: 'SUCCESS'

venkywonka · 2025-05-09T22:49:18Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-09T22:54:52Z

PR_Github #4737 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-10T02:08:55Z

PR_Github #4737 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3419 completed with status: 'FAILURE'

venkywonka · 2025-05-12T20:13:03Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-14T04:39:11Z

PR_Github #5109 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-14T09:32:29Z

PR_Github #5109 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3721 completed with status: 'FAILURE'

venkywonka · 2025-05-14T22:48:09Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-14T22:54:01Z

PR_Github #5216 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-15T02:31:48Z

PR_Github #5216 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3808 completed with status: 'SUCCESS'

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml

venkywonka · 2025-05-16T19:42:56Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-16T19:49:26Z

PR_Github #5534 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-16T22:59:10Z

PR_Github #5534 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4035 completed with status: 'FAILURE'

venkywonka · 2025-05-16T23:13:41Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-16T23:19:32Z

PR_Github #5544 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-17T01:49:00Z

PR_Github #5544 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4045 completed with status: 'FAILURE'

Signed-off-by: Venky Ganesh <[email protected]>

Signed-off-by: Venky <[email protected]>

venkywonka · 2025-05-19T11:36:44Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-19T12:20:04Z

PR_Github #5723 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-19T17:55:38Z

PR_Github #5723 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4181 completed with status: 'SUCCESS'

venkywonka marked this pull request as ready for review May 7, 2025 16:51

venkywonka requested review from LarryXFly, Naveassaf, ruodil, schetlur-nv and tijyojwad May 7, 2025 16:54

venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from 238c866 to af359d2 Compare May 7, 2025 16:58

venkywonka self-assigned this May 7, 2025

venkywonka mentioned this pull request May 7, 2025

fix: Set trust_remote_code=True when verifying config.json load #4068

Closed

venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from 8330f99 to 4a95b91 Compare May 8, 2025 04:07

ruodil reviewed May 8, 2025

View reviewed changes

tests/integration/defs/perf/test_perf.py Outdated Show resolved Hide resolved

ruodil reviewed May 8, 2025

View reviewed changes

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml Outdated Show resolved Hide resolved

ruodil reviewed May 8, 2025

View reviewed changes

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml Outdated Show resolved Hide resolved

venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from 4a95b91 to 28b41d8 Compare May 8, 2025 14:19

This was referenced May 14, 2025

test: update test filter in perf test yml file to select cases by gpu name and add cases for RTX 6000 pro #4282

Merged

test: add llama_v4_scout_instruct and llama_v4_maverick_instruct into perf test #4296

Closed

venkywonka requested a review from kaiyux May 14, 2025 13:59

venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from 36c87dc to 9b7eb3b Compare May 14, 2025 20:30

venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from 9b7eb3b to 721533d Compare May 14, 2025 23:10

venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from 721533d to 85aeba4 Compare May 15, 2025 13:10

venkywonka commented May 15, 2025

View reviewed changes

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml Outdated Show resolved Hide resolved

LarryXFly approved these changes May 16, 2025

View reviewed changes

venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from e61f001 to abb4c42 Compare May 16, 2025 19:42

venkywonka added 4 commits May 19, 2025 04:36

changes to run llama-v3.3-nemotron-super-49b

e2c9e25

Signed-off-by: Venky Ganesh <[email protected]>

yapf

7a4ad31

Signed-off-by: Venky Ganesh <[email protected]>

address review comments pt 1

3d67ed8

Signed-off-by: Venky Ganesh <[email protected]>

re-add cpp super tests

9db8469

Signed-off-by: Venky <[email protected]>

venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from abb4c42 to 9db8469 Compare May 19, 2025 11:36

schetlur-nv merged commit bb02d86 into NVIDIA:main May 19, 2025
3 checks passed

venkywonka mentioned this pull request Jun 2, 2025

test: shorten reqs in con:1 cases and add streaming cases, add l2 perf test #4796

Merged

test(perf): Add some Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (TRT flow, trtllm-bench) #4128

test(perf): Add some Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (TRT flow, trtllm-bench) #4128

Uh oh!

Conversation

venkywonka commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Performance Summary – llama_v3.3_nemotron_super_49b

Run Invariants

Execution Status Matrix

Uh oh!

venkywonka commented May 7, 2025

Uh oh!

tensorrt-cicd commented May 7, 2025

Uh oh!

venkywonka commented May 7, 2025

Uh oh!

tensorrt-cicd commented May 7, 2025

Uh oh!

tensorrt-cicd commented May 7, 2025

Uh oh!

tensorrt-cicd commented May 7, 2025

Uh oh!

venkywonka commented May 8, 2025

Uh oh!

tensorrt-cicd commented May 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented May 8, 2025

Uh oh!

venkywonka commented May 8, 2025

Uh oh!

tensorrt-cicd commented May 8, 2025

Uh oh!

tensorrt-cicd commented May 8, 2025

Uh oh!

venkywonka commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 10, 2025

Uh oh!

venkywonka commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

venkywonka commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

Uh oh!

venkywonka commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

venkywonka commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 17, 2025

Uh oh!

venkywonka commented May 19, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

test(perf): Add some `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (TRT flow, trtllm-bench) #4128

test(perf): Add some `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (TRT flow, trtllm-bench) #4128

venkywonka commented May 7, 2025 •

edited

Loading

Performance Summary – `llama_v3.3_nemotron_super_49b`