Skip to content

Conversation

Isotr0py
Copy link
Owner

@Isotr0py Isotr0py marked this pull request as ready for review August 7, 2025 05:46
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
@Isotr0py
Copy link
Owner Author

Isotr0py commented Aug 9, 2025

Benchmark results

Hardware: NVIDIA GeForce RTX 3090 24GB
Hidden size=1024, num_tokens=4096

Quantization Main Time (ms) PR Time (ms)
Q2_K 21.37505244 9.757954925
Q3_K 54.36249449 8.700753478
Q4_K 32.7376706 6.185650527
Q5_K 42.49698077 6.589026637
Q6_K 45.83637421 6.769261481
Q4_0 23.81569172 5.485869246
Q5_0 37.47556383 5.97816959
Q8_0 40.68143068 5.784195224

Signed-off-by: Isotr0py <[email protected]>
@Isotr0py Isotr0py merged commit 239c8fd into main Aug 9, 2025
@Isotr0py Isotr0py deleted the mmq-update2 branch August 9, 2025 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant