Skip to content

Conversation

@venkywonka
Copy link
Collaborator

@venkywonka venkywonka commented May 7, 2025

Description

  • Add some Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp backend, trtllm-bench).
  • This also exposes a --trust_remote_code flag in the trtllm-bench-build subcommand, that is required for transformers library to use Autoclasses to load DeciLM-based models (Llama-Nemotron-Super being one of them).
  • This PR also changes config.py and model.py for the DeciLMForCausalLM classes to have trust_remote_code=True by default (it was False by default previously) for thing to work smoothly without extra parametrizations when run from top-level trtllm-bench.

Performance Summary – llama_v3.3_nemotron_super_49b

isl osl quant con backend req/s tps /gpu avg latency ms p50 latency ms
5000 500 none 1 cpp 0.1075 13.4317 9 306.1785 9 305.0552
5000 500 fp8 1 cpp 0.1485 18.5636 6 733.4385 6 730.6310
5000 500 none 250 cpp 0.6116 76.4499 317 885.8769 401 171.5739
5000 500 fp8 250 cpp 0.7220 90.2495 269 376.7776 340 154.1910
500 2000 none 1 cpp 0.0304 15.2075 32 878.3526 32 877.1050
500 2000 fp8 1 cpp 0.0435 21.7563 22 981.6188 22 975.2227
500 2000 none 250 cpp 0.3274 163.7098 589 062.8547 733 682.4485
500 2000 fp8 250 cpp 0.4158 207.8830 463 903.2804 577 812.6816

Run Invariants

  • Model: llama_v3.3_nemotron_super_49b
  • Backend: cpp (builds TensorRT engines)
  • Precision: BF16 baseline, FP8 quantized variants
  • Max batch size: 16  •  GPUs: 4 (per-GPU throughput shown above)
  • Benchmark tool: trtllm-bench
  • Synthetic dataset: 512 sequences per run

Execution Status Matrix

backend isl osl quant con status
cpp 5000 500 none 1 TIMEOUT
cpp 5000 500 fp8 1 TIMEOUT
cpp 5000 500 none 250 PASS
cpp 5000 500 fp8 250 PASS
cpp 500 2000 none 1 TIMEOUT
cpp 500 2000 fp8 1 TIMEOUT
cpp 500 2000 none 250 PASS
cpp 500 2000 fp8 250 PASS

@venkywonka venkywonka marked this pull request as ready for review May 7, 2025 16:51
@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@venkywonka venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from 238c866 to af359d2 Compare May 7, 2025 16:58
@tensorrt-cicd
Copy link
Collaborator

PR_Github #4410 [ run ] triggered by Bot

@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4421 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4410 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4421 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3183 completed with status: 'FAILURE'

@venkywonka venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from 8330f99 to 4a95b91 Compare May 8, 2025 04:07
@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4472 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4472 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3209 completed with status: 'FAILURE'

@venkywonka venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from 4a95b91 to 28b41d8 Compare May 8, 2025 14:19
@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4584 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4584 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3298 completed with status: 'SUCCESS'

@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4737 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #4737 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3419 completed with status: 'FAILURE'

@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5109 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5109 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3721 completed with status: 'FAILURE'

@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5216 [ run ] triggered by Bot

@venkywonka venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from 9b7eb3b to 721533d Compare May 14, 2025 23:10
@tensorrt-cicd
Copy link
Collaborator

PR_Github #5216 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3808 completed with status: 'SUCCESS'

@venkywonka venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from 721533d to 85aeba4 Compare May 15, 2025 13:10
@venkywonka venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from e61f001 to abb4c42 Compare May 16, 2025 19:42
@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5534 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5534 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4035 completed with status: 'FAILURE'

@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5544 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5544 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4045 completed with status: 'FAILURE'

Signed-off-by: Venky Ganesh <[email protected]>
Signed-off-by: Venky Ganesh <[email protected]>
@venkywonka venkywonka force-pushed the user/venkywonka/ll-nemo-super-trt-perf-tests branch from abb4c42 to 9db8469 Compare May 19, 2025 11:36
@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5723 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5723 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4181 completed with status: 'SUCCESS'

@schetlur-nv schetlur-nv merged commit bb02d86 into NVIDIA:main May 19, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants