test(perf): Add remaining Phi-4-mini-instruct
perf tests
#4443
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Add Remaining
Phi-4-mini-instruct
BF16 / FP8 perf results.The two single-GPU tests that timed out earlier in #4267 now finish well within the CI limits.
Both native BF16 and post-quantised FP8 variants are included.
Performance Summary
(req/s)
(tok/s)
(ms)
Raw metrics
All runs use
max_batch_size = 32
,requests = 8
,concurrency = 1
on a single H100-80GB GPU.