Skip to content

Conversation

ggerganov
Copy link
Member

The AMX backend produces garbage with Q4_0 and Q4_1 quantizations:

https://github.com/ggml-org/llama.cpp/actions/runs/18070620247/job/51419855639#step:3:13646

Repro:

./bin/llama-perplexity -hf ggml-org/Qwen3-0.6B-GGUF:Q4_0 -f wikitext-2-raw/wiki.test.raw -c 2048 -b 2048 --chunks 2

The only problem that I was able to find using the address sanitizer is this unaligned access of the quants. However, this does not fix the incorrect results - there is some remaining issue somewhere.

cc @mingfeima

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 28, 2025
if (op->op == GGML_OP_MUL_MAT && is_contiguous_2d(op->src[0]) && // src0 must be contiguous
is_contiguous_2d(op->src[1]) && // src1 must be contiguous
op->src[0]->buffer && op->src[0]->buffer->buft == ggml_backend_amx_buffer_type() &&
op->src[0]->ne[0] % (TILE_K * 2 * 32) == 0 && // TODO: not sure if correct (https://github.com/ggml-org/llama.cpp/pull/16315)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this fixes the CI workflows, though I'm not sure about the reasoning. It effectively requires the row size to be multiple of 2048.

@ggerganov ggerganov marked this pull request as ready for review September 29, 2025 07:38
@ggerganov ggerganov requested a review from slaren as a code owner September 29, 2025 07:38
@Gadflyii
Copy link

Gadflyii commented Oct 2, 2025

I could not reproduce this error with:

build/bin/llama-perplexity -m /mnt/ssd2/AI/Qwen3_30B/Q4_0/Qwen3-30B-A3B-Thinking-2507-Q4_0.gguf -f /tmp/wikitext-2-raw/wiki.test.raw -c 2048 -b 2048 --chunks 2

Any suggestions?

What garbage are you getting out of the AMX backend?

@ggerganov ggerganov merged commit a23b9bd into master Oct 6, 2025
65 of 67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants