test(perf): Add remaining `Phi-4-mini-instruct` perf tests #4443

venkywonka · 2025-05-19T12:56:14Z

Description

Add Remaining Phi-4-mini-instruct BF16 / FP8 perf results.
The two single-GPU tests that timed out earlier in #4267 now finish well within the CI limits.
Both native BF16 and post-quantised FP8 variants are included.

Performance Summary

backend	in len	out len	quant	concurrency	req throughput (req/s)	output TPS (tok/s)	avg latency (ms)	result
cpp	500	2000	–	1	0.0892	178.31	11 216	PASS
cpp	500	2000	fp8	1	0.1315	263.07	7 602	PASS

Raw metrics

metric	BF16	FP8
Request Throughput (req/s)	0.0892	0.1315
Total Output Throughput (tok/s)	178.31	263.07
Total Token Throughput (tok/s)	222.88	328.84
Average Latency (ms)	11 216.49	7 602.42
P50 Latency (ms)	11 218.60	7 602.73
P90 Latency (ms)	11 222.94	7 614.00
P95 Latency (ms)	11 222.94	7 614.00
P99 Latency (ms)	11 222.94	7 614.00

All runs use max_batch_size = 32, requests = 8, concurrency = 1 on a single H100-80GB GPU.

Signed-off-by: Venky <[email protected]>

Copilot

Pull Request Overview

Add two remaining performance tests for the phi_4_mini_instruct model in BF16 and FP8 precision, using lower request counts and concurrency to fit CI timeouts.

Introduced BF16 and FP8 variants with reqs:8 and con:1.
Added a comment clarifying why request counts were reduced.

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml

…s-ext

add remaining 2 phi cpp perf tests Signed-off-by: Venky <[email protected]> Co-authored-by: Larry <[email protected]>

…sts (#4443) (#4589) Signed-off-by: Venky <[email protected]> Co-authored-by: Larry <[email protected]>

add remaining 2 phi cpp perf tests

ff86b8a

Signed-off-by: Venky <[email protected]>

venkywonka requested review from Copilot, ruodil, tijyojwad, schetlur-nv and LarryXFly May 19, 2025 12:56

venkywonka marked this pull request as ready for review May 19, 2025 12:56

Copilot AI reviewed May 19, 2025

View reviewed changes

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml Show resolved Hide resolved

tijyojwad approved these changes May 19, 2025

View reviewed changes

LarryXFly approved these changes May 21, 2025

View reviewed changes

LarryXFly enabled auto-merge (squash) May 21, 2025 01:17

Merge branch 'main' into user/venky/phi-4-mini-instruct-cpp-perf-test…

7560cb6

…s-ext

LarryXFly disabled auto-merge May 21, 2025 01:26

LarryXFly merged commit 9a8c3ec into NVIDIA:main May 21, 2025
2 checks passed

venkywonka mentioned this pull request May 22, 2025

[cherry-pick] test(perf): Add remaining Phi-4-mini-instruct perf tests (#4443) #4589

Merged

chzblych pushed a commit that referenced this pull request May 28, 2025

[cherry-pick] test(perf): Add remaining Phi-4-mini-instruct perf te…

42e622a

…sts (#4443) (#4589) Signed-off-by: Venky <[email protected]> Co-authored-by: Larry <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(perf): Add remaining `Phi-4-mini-instruct` perf tests #4443

test(perf): Add remaining `Phi-4-mini-instruct` perf tests #4443

Uh oh!

venkywonka commented May 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

test(perf): Add remaining Phi-4-mini-instruct perf tests #4443

test(perf): Add remaining Phi-4-mini-instruct perf tests #4443

Uh oh!

Conversation

venkywonka commented May 19, 2025

Description

Raw metrics

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

test(perf): Add remaining `Phi-4-mini-instruct` perf tests #4443

test(perf): Add remaining `Phi-4-mini-instruct` perf tests #4443