metal : use F32 attention accumulators in FA kernels #13975

ggerganov · 2025-06-02T16:06:40Z

It seems that the attention output lo overflows F16 at large context (more than 32k). This fixes Gemma 3 27B at large contexts with Metal.

ggml-ci

Animaxx · 2025-06-08T03:39:24Z

I think this change cause issue in M1, M2 iOS device #14055

metal : use F32 accumulators in FA kernels

21be70e

ggml-ci

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jun 2, 2025

ggerganov mentioned this pull request Jun 2, 2025

Eval bug: Gemma3 <unused32> spam #12433

Closed

ggerganov merged commit ea394d7 into master Jun 2, 2025
53 checks passed

ggerganov deleted the gg/metal-fa-acc-f32 branch June 2, 2025 18:33

furyhawk pushed a commit to furyhawk/llama.cpp that referenced this pull request Jun 6, 2025

metal : use F32 accumulators in FA kernels (ggml-org#13975)

9063456

ggml-ci

Animaxx mentioned this pull request Jun 7, 2025

Eval bug: Compute function exceeds available stack space #14055

Closed

ggerganov mentioned this pull request Jun 9, 2025

metal : use less stack memory in FA kernel #14088

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal : use F32 attention accumulators in FA kernels #13975

metal : use F32 attention accumulators in FA kernels #13975

Uh oh!

ggerganov commented Jun 2, 2025

Uh oh!

Uh oh!

Animaxx commented Jun 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

metal : use F32 attention accumulators in FA kernels #13975

metal : use F32 attention accumulators in FA kernels #13975

Uh oh!

Conversation

ggerganov commented Jun 2, 2025

Uh oh!

Uh oh!

Animaxx commented Jun 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants