Make default RoPE path explicit in RotaryEmbedding classes #41342

DevPatel-11 · 2025-10-04T12:16:18Z

🧩 Pull Request: Make default RoPE path explicit in Llama RotaryEmbedding initialization

📜 Summary

This PR addresses issue #39753, ensuring that the default Rotary Positional Embedding (RoPE) initialization path is made explicit in the Llama model.

Previously, inv_freq was implicitly assigned, which went against the library’s philosophy of explicit initialization for reproducibility and clarity.

This change:

Makes the "default" RoPE path explicit by selecting ROPE_INIT_FUNCTIONS["default"] when self.rope_type == "default".
Uses ROPE_INIT_FUNCTIONS[self.rope_type] for non-default cases.
Keeps the initialization logic (inv_freq, attention_scaling) delegated to the RoPE function—no manual computation is done in the constructor.

The modification is minimal, isolated, and maintains backward compatibility.

🧠 Motivation

Following the discussion in #39753, the goal is to make RoPE initialization behavior transparent and avoid implicit defaults.
This aligns with 🤗 Transformers’ philosophy of clarity and explicit model configuration.

🛠️ Changes Made

In transformers/src/transformers/models/llama/modeling_llama.py and more 20 models:

self.config = config
if self.rope_type == "default":
    self.rope_init_fn = ROPE_INIT_FUNCTIONS["default"]
else:
    self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]

inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device)

Delegates all logic to the existing RoPE functions.
Reduces potential maintenance risk by avoiding any new computation paths.
Keeps inv_freq and attention_scaling initialization consistent with the rest of the library.

✅ Checklist

The PR title summarizes the contribution.
Linked to issue #39753.
Tested locally — no change in model outputs or test failures.
Backward compatible (no breaking changes).

🧪 Testing

Verified initialization behavior for both "default" and non-default RoPE types.
Confirmed no change in model outputs.
Ran relevant test subset:
```
pytest
```

💬 Additional Notes

This is a small but meaningful cleanup that improves code readability and aligns initialization logic with the library’s explicit philosophy.
No other files or RoPE logic were touched.

… StableLM as adapted from Llama implementation

… models

github-actions · 2025-10-04T14:34:52Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, arcee, aria, bamba, bitnet, blt, cohere, cohere2, csm, dbrx, deepseek_v3, dia, diffllama, doge, dots1, emu3

github-actions · 2025-10-04T14:34:53Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, arcee, aria, bamba, bitnet, blt, cohere, cohere2, csm, dbrx, deepseek_v3, dia, diffllama, doge, dots1, emu3

zucchini-nlp

Thanks for the PR @DevPatel-11. I am currently reworking RoPE for all models in #39847 which also includes moving the default freq computation to model files. I'd prefer to not merge a separate PR for rope freq in the meanwhile as it would results in huge merge conflicts

DevPatel-11 · 2025-10-06T13:36:00Z

Ok @zucchini-nlp.
Would love to help you for other issues too!

Make default RoPE path explicit in RotaryEmbedding classes

6c0ea7c

DevPatel-11 closed this Oct 4, 2025

Refactor: mark RotaryEmbedding in Falcon, Mimi, Moshi, Persimmon, and…

0d9ee0d

… StableLM as adapted from Llama implementation

DevPatel-11 reopened this Oct 4, 2025

DevPatel-11 closed this Oct 4, 2025

Standardize RoPE initialization and formatting across all transformer…

f51cb87

… models

DevPatel-11 reopened this Oct 4, 2025

DevPatel-11 mentioned this pull request Oct 4, 2025

Inv frequency has not default, going against our philosophy #39753

Open

zucchini-nlp reviewed Oct 6, 2025

View reviewed changes

DevPatel-11 closed this Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make default RoPE path explicit in RotaryEmbedding classes #41342

Make default RoPE path explicit in RotaryEmbedding classes #41342

Uh oh!

DevPatel-11 commented Oct 4, 2025

Uh oh!

github-actions bot commented Oct 4, 2025

Uh oh!

github-actions bot commented Oct 4, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

DevPatel-11 commented Oct 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Make default RoPE path explicit in RotaryEmbedding classes #41342

Make default RoPE path explicit in RotaryEmbedding classes #41342

Uh oh!

Conversation

DevPatel-11 commented Oct 4, 2025

🧩 Pull Request: Make default RoPE path explicit in Llama RotaryEmbedding initialization

📜 Summary

🧠 Motivation

🛠️ Changes Made

✅ Checklist

🧪 Testing

💬 Additional Notes

Uh oh!

github-actions bot commented Oct 4, 2025

Uh oh!

github-actions bot commented Oct 4, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

DevPatel-11 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DevPatel-11 commented Oct 6, 2025 •

edited

Loading