Skip to content

Conversation

EduardDurech
Copy link
Contributor

Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

  • xIELU Activation
  • QK-norm

@ArthurZucker

@EduardDurech
Copy link
Contributor Author

@dhia680

@chiffa
Copy link

chiffa commented Jul 14, 2025

@Cyrilvallez - this is the part 1 of the PR from Swiss AI initiative

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice and very transformers like! Do you mind using modular to isolate the changes?

@EduardDurech
Copy link
Contributor Author

very nice and very transformers like! Do you mind using modular to isolate the changes?

Yep, I was planning to and Cyril suggested, I built from Alex's original implementation but I'll refactor

@ArthurZucker
Copy link
Collaborator

It should not require too much changes don't worry, its already in an excellent state!

EduardDurech and others added 7 commits August 27, 2025 21:40
As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe.
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, auto

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright! Very very nice, congrats! Super efficient work! Merging it now! 🤗

@ArthurZucker
Copy link
Collaborator

Hey @EduardDurech do you think we can add integration tests now? 🤗

@EduardDurech
Copy link
Contributor Author

@ArthurZucker yea should be able, the models are hosted on HF now, @dhia680 would you be able to? I'm too busy with RL for a bit

@ArthurZucker
Copy link
Collaborator

Nice! 🤗 we can also have a god if neither of you can!!

vermouth1992 pushed a commit to volcengine/verl that referenced this pull request Sep 13, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
wlf-darkmatter pushed a commit to wlf-darkmatter/verl that referenced this pull request Sep 13, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
@EduardDurech
Copy link
Contributor Author

@andresnowak has a draft PR for tests #41037 if you guys want to check, in the meantime there are xIELU CUDA parity issues (already known issue) I asked the group about, will see if that's fixed before and included

VocabVictor pushed a commit to VocabVictor/verl-plus that referenced this pull request Sep 24, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants