Skip to content

Conversation

venkywonka
Copy link
Collaborator

Description

Add Remaining Phi-4-mini-instruct BF16 / FP8 perf results.
The two single-GPU tests that timed out earlier in #4267 now finish well within the CI limits.
Both native BF16 and post-quantised FP8 variants are included.

Performance Summary
backend in len out len quant concurrency req throughput
(req/s)
output TPS
(tok/s)
avg latency
(ms)
result
cpp 500 2000 1 0.0892 178.31 11 216 PASS
cpp 500 2000 fp8 1 0.1315 263.07 7 602 PASS

Raw metrics

metric BF16 FP8
Request Throughput (req/s) 0.0892 0.1315
Total Output Throughput (tok/s) 178.31 263.07
Total Token Throughput (tok/s) 222.88 328.84
Average Latency (ms) 11 216.49 7 602.42
P50 Latency (ms) 11 218.60 7 602.73
P90 Latency (ms) 11 222.94 7 614.00
P95 Latency (ms) 11 222.94 7 614.00
P99 Latency (ms) 11 222.94 7 614.00

All runs use max_batch_size = 32, requests = 8, concurrency = 1 on a single H100-80GB GPU.

@venkywonka venkywonka marked this pull request as ready for review May 19, 2025 12:56
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Add two remaining performance tests for the phi_4_mini_instruct model in BF16 and FP8 precision, using lower request counts and concurrency to fit CI timeouts.

  • Introduced BF16 and FP8 variants with reqs:8 and con:1.
  • Added a comment clarifying why request counts were reduced.

@LarryXFly LarryXFly enabled auto-merge (squash) May 21, 2025 01:17
@LarryXFly LarryXFly disabled auto-merge May 21, 2025 01:26
@LarryXFly LarryXFly merged commit 9a8c3ec into NVIDIA:main May 21, 2025
2 checks passed
venkywonka added a commit to venkywonka/TensorRT-LLM that referenced this pull request May 22, 2025
add remaining 2 phi cpp perf tests

Signed-off-by: Venky <[email protected]>
Co-authored-by: Larry <[email protected]>
chzblych pushed a commit that referenced this pull request May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants