Support fp8 with static scales #725

lburzawa · 2025-10-03T04:13:30Z

Purpose

Add support for gpt-oss model with fp8 static scales.

Check accuracy and performance.

Accuracy:

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9447	±	0.0063
		strict-match	5	exact_match	↑	0.7574	±	0.0118

Performance:

con	ISL	OSL	bf16 TTFT (ms)	fp8 TTFT (ms)	speedup	bf16 TPOT (ms)	fp8 TPOT (ms)	speedup	bf16 tput (tok/s)	fp8 tput (tok/s)	speedup
4	1024	1024	79.5	75.6	1.05	4.75	4.44	1.07	1657	1784	1.08
4	8192	1024	445.37	333.77	1.33	4.93	4.65	1.06	6719	7282	1.08
4	1024	8192	96	75.97	1.26	4.73	4.57	1.04	949	994	1.05
64	1024	1024	413.82	255.58	1.62	11.79	11.15	1.06	10490	11238	1.07
64	8192	1024	632.4	465.98	1.36	19.95	17.64	1.13	27878	31586	1.13
64	1024	8192	382.39	278.08	1.38	12.42	12.13	1.02	5763	5926	1.03

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

add support for fp8 with static scales

cb18575

lburzawa assigned vgokhale, azaidy and dllehr-amd Oct 3, 2025