Skip to content

Conversation

DevPatel-11
Copy link

🧩 Pull Request: Make default RoPE path explicit in Llama RotaryEmbedding initialization

📜 Summary

This PR addresses issue #39753, ensuring that the default Rotary Positional Embedding (RoPE) initialization path is made explicit in the Llama model.

Previously, inv_freq was implicitly assigned, which went against the library’s philosophy of explicit initialization for reproducibility and clarity.

This change:

  • Makes the "default" RoPE path explicit by selecting ROPE_INIT_FUNCTIONS["default"] when self.rope_type == "default".
  • Uses ROPE_INIT_FUNCTIONS[self.rope_type] for non-default cases.
  • Keeps the initialization logic (inv_freq, attention_scaling) delegated to the RoPE function—no manual computation is done in the constructor.

The modification is minimal, isolated, and maintains backward compatibility.


🧠 Motivation

Following the discussion in #39753, the goal is to make RoPE initialization behavior transparent and avoid implicit defaults.
This aligns with 🤗 Transformers’ philosophy of clarity and explicit model configuration.


🛠️ Changes Made

In transformers/src/transformers/models/llama/modeling_llama.py and more 20 models:

self.config = config
if self.rope_type == "default":
    self.rope_init_fn = ROPE_INIT_FUNCTIONS["default"]
else:
    self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]

inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device)
  • Delegates all logic to the existing RoPE functions.
  • Reduces potential maintenance risk by avoiding any new computation paths.
  • Keeps inv_freq and attention_scaling initialization consistent with the rest of the library.

✅ Checklist

  • The PR title summarizes the contribution.
  • Linked to issue #39753.
  • Tested locally — no change in model outputs or test failures.
  • Backward compatible (no breaking changes).

🧪 Testing

  • Verified initialization behavior for both "default" and non-default RoPE types.
  • Confirmed no change in model outputs.
  • Ran relevant test subset:
    pytest

💬 Additional Notes

This is a small but meaningful cleanup that improves code readability and aligns initialization logic with the library’s explicit philosophy.
No other files or RoPE logic were touched.

@DevPatel-11 DevPatel-11 closed this Oct 4, 2025
… StableLM as adapted from Llama implementation
@DevPatel-11 DevPatel-11 reopened this Oct 4, 2025
@DevPatel-11 DevPatel-11 closed this Oct 4, 2025
@DevPatel-11 DevPatel-11 reopened this Oct 4, 2025
Copy link
Contributor

github-actions bot commented Oct 4, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, arcee, aria, bamba, bitnet, blt, cohere, cohere2, csm, dbrx, deepseek_v3, dia, diffllama, doge, dots1, emu3

1 similar comment
Copy link
Contributor

github-actions bot commented Oct 4, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, arcee, aria, bamba, bitnet, blt, cohere, cohere2, csm, dbrx, deepseek_v3, dia, diffllama, doge, dots1, emu3

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @DevPatel-11. I am currently reworking RoPE for all models in #39847 which also includes moving the default freq computation to model files. I'd prefer to not merge a separate PR for rope freq in the meanwhile as it would results in huge merge conflicts

@DevPatel-11
Copy link
Author

DevPatel-11 commented Oct 6, 2025

Ok @zucchini-nlp.
Would love to help you for other issues too!

@DevPatel-11 DevPatel-11 closed this Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants