Skip to content

Conversation

@ggerganov
Copy link
Member

fix #12433 (comment)

It seems that the attention output lo overflows F16 at large context (more than 32k). This fixes Gemma 3 27B at large contexts with Metal.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jun 2, 2025
@ggerganov ggerganov merged commit ea394d7 into master Jun 2, 2025
53 checks passed
@ggerganov ggerganov deleted the gg/metal-fa-acc-f32 branch June 2, 2025 18:33
furyhawk pushed a commit to furyhawk/llama.cpp that referenced this pull request Jun 6, 2025
@Animaxx
Copy link

Animaxx commented Jun 8, 2025

I think this change cause issue in M1, M2 iOS device #14055

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Gemma3 <unused32> spam

3 participants