Skip to content

Conversation

eliotwang
Copy link

@eliotwang eliotwang commented Aug 22, 2025

Accuracy verification of deepseek-671B on the CEval dataset:
With Gemm-fp8-blockquant-ck_tile kernel integrated, the accuracy on Ceval datasets has NO loss, 88.7%(triton gemm) vs 89.07(ck-tile gemm)
End-to-end performance of deepseek-671B on MI300X:
With Gemm-fp8-blockquant-ck_tile kernel integrated, achieve 107%~120% performance boost against triton kernel, achieve 113%~123% performance boost against triton kernel on MI300X
image
image

@maleksan85
Copy link

please target PR against of 355_wip branch. The change LGTM! Please land!

maleksan85
maleksan85 previously approved these changes Aug 22, 2025
Mcirino1 and others added 11 commits September 1, 2025 05:15
* Updated README.md for June 10 release

* Added Docker Manifest git hash
* Updated README.md for June 24 Docker release

* Added additional throughput results

* Fixed some throughput results
* Minor changes to command line examples

* README changes and added throughput results

Still waiting on latency

* Added latency results

* Update README.md

* Update README.md
* Update test-pipeline.yaml

Disabling the "Tensorizer Test".

The test is seen to generate exceptions while still reporting as successful. That needs to be verified before re-enabling the test in the production environment.

Signed-off-by: Alexei V. Ivanov <[email protected]>

* Fixing pre-commit complaints.

Signed-off-by: Alexei V. Ivanov <[email protected]>

* .

Signed-off-by: Alexei V. Ivanov <[email protected]>

---------

Signed-off-by: Alexei V. Ivanov <[email protected]>
@eliotwang eliotwang changed the base branch from main to 355_wip September 2, 2025 07:38
@charyang-ai charyang-ai merged commit 176244a into ROCm:355_wip Sep 5, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants