-
Notifications
You must be signed in to change notification settings - Fork 30.8k
Add Apertus #39381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Apertus #39381
Conversation
1ea6373
to
353c6c0
Compare
1f20c58
to
1f4e715
Compare
@Cyrilvallez - this is the part 1 of the PR from Swiss AI initiative |
2728d3c
to
b53417c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice and very transformers like! Do you mind using modular
to isolate the changes?
Yep, I was planning to and Cyril suggested, I built from Alex's original implementation but I'll refactor |
It should not require too much changes don't worry, its already in an excellent state! |
Co-authored-by: Cyril Vallez <[email protected]>
c66e7b4
to
792b7de
Compare
As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe.
Not needed (for now)
Following this: huggingface#39782
[For maintainers] Suggested jobs to run (before merge) run-slow: apertus, auto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright! Very very nice, congrats! Super efficient work! Merging it now! 🤗
Hey @EduardDurech do you think we can add integration tests now? 🤗 |
@ArthurZucker yea should be able, the models are hosted on HF now, @dhia680 would you be able to? I'm too busy with RL for a bit |
Nice! 🤗 we can also have a god if neither of you can!! |
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
@andresnowak has a draft PR for tests #41037 if you guys want to check, in the meantime there are xIELU CUDA parity issues (already known issue) I asked the group about, will see if that's fixed before and included |
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative
Main modifications from Llama
@ArthurZucker