llama : (mrope) allow using normal 1D position for text token #13138

ngxson · 2025-04-27T16:25:27Z

For M-RoPE, we want to use normal 1D position for text token.

This is done to simplify the use case of llama_decode() with text tokens, which is needed for adding Qwen2VL to libmtmd and to server.cpp

This should also align with #11875, because in the future we want text position to be tracked internally by libllama

ngxson · 2025-04-27T16:26:49Z

src/llama-graph.cpp


 ggml_tensor * llm_graph_context::build_inp_attn_scale() const {
-    auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_token(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);
+    auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_embd(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);


@ggerganov Because build_inp_attn_scale is currently used exclusively by llama 4, do you think we should get rid of n_pos_per_embd and replace it with a GGML_ASSERT(n_pos_per_embd() == 1) ?

The main motivation is to make this code looks less complicated, as there is ~0% chance Qwen model gonna use this

Yes, we can do that.

On second thought, build_inp_attn_scale should work well even in the case of N pos per token.

That's because the scale is applied per embedding, and the number of embedding is independent from N pos per token.

In any cases, I removed the n_pos_per_embd in 9cd16a3 , merging this PR once the CI is green

ggerganov · 2025-04-28T05:27:25Z

src/llama-graph.cpp


 ggml_tensor * llm_graph_context::build_inp_attn_scale() const {
-    auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_token(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);
+    auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_embd(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);


Yes, we can do that.

llama : (mrope) use normal position for text token

bd310ff

ngxson requested a review from ggerganov April 27, 2025 16:25

github-actions bot added the examples label Apr 27, 2025

ngxson commented Apr 27, 2025

View reviewed changes

ngxson mentioned this pull request Apr 27, 2025

mtmd : add qwen2vl and qwen2.5vl #13141

Merged

ggerganov approved these changes Apr 28, 2025

View reviewed changes

rm n_pos_per_embd from llm_graph_input_attn_temp

9cd16a3

ngxson merged commit d2b2031 into ggml-org:master Apr 28, 2025
48 checks passed

ngxson mentioned this pull request Apr 28, 2025

llama-graph : fix text position for mrope #13159

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : (mrope) allow using normal 1D position for text token #13138

llama : (mrope) allow using normal 1D position for text token #13138

Uh oh!

ngxson commented Apr 27, 2025 •

edited

Loading

Uh oh!

ngxson Apr 27, 2025 •

edited

Loading

Uh oh!

ggerganov Apr 28, 2025

Uh oh!

ngxson Apr 28, 2025

Uh oh!

ggerganov Apr 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

llama : (mrope) allow using normal 1D position for text token #13138

llama : (mrope) allow using normal 1D position for text token #13138

Uh oh!

Conversation

ngxson commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Apr 27, 2025 •

edited

Loading

ngxson Apr 27, 2025 •

edited

Loading