Add Apertus #39381

EduardDurech · 2025-07-12T21:58:47Z

Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

xIELU Activation
QK-norm

@ArthurZucker

EduardDurech · 2025-07-12T21:59:37Z

@dhia680

chiffa · 2025-07-14T08:54:42Z

@Cyrilvallez - this is the part 1 of the PR from Swiss AI initiative

ArthurZucker

very nice and very transformers like! Do you mind using modular to isolate the changes?

EduardDurech · 2025-07-15T14:18:29Z

very nice and very transformers like! Do you mind using modular to isolate the changes?

Yep, I was planning to and Cyril suggested, I built from Alex's original implementation but I'll refactor

ArthurZucker · 2025-07-15T14:49:02Z

It should not require too much changes don't worry, its already in an excellent state!

tests/models/apertus/test_modeling_apertus.py

Co-authored-by: Cyril Vallez <[email protected]>

As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe.

Not needed (for now)

Following this: huggingface#39782

github-actions · 2025-08-27T23:36:08Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, auto

Cyrilvallez

Alright! Very very nice, congrats! Super efficient work! Merging it now! 🤗

ArthurZucker · 2025-09-12T13:21:14Z

Hey @EduardDurech do you think we can add integration tests now? 🤗

EduardDurech · 2025-09-12T13:58:05Z

@ArthurZucker yea should be able, the models are hosted on HF now, @dhia680 would you be able to? I'm too busy with RL for a bit

ArthurZucker · 2025-09-12T15:40:41Z

Nice! 🤗 we can also have a god if neither of you can!!

Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />

EduardDurech · 2025-09-21T17:03:18Z

@andresnowak has a draft PR for tests #41037 if you guys want to check, in the meantime there are xIELU CUDA parity issues (already known issue) I asked the group about, will see if that's fixed before and included

Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />

haeggee added 7 commits July 12, 2025 23:33

init swissai model

77810b8

AutoModelForCausalLM

bcdaf70

AutoModelForCausalLM mapping

53a3755

qk norm and post ln optional

7c648e7

fix wrong shape of qk norm: megatron uses head_dim

d9a923d

automodel fixes

f35ee01

minor fix in forward

e6921f7

dhia680 and others added 7 commits July 13, 2025 01:14

fix rope validation to accept llama3 scaling

46ca1ae

SwissAIForTokenClassification support

994b1d7

Align SwissAI to v4.52.4

8b38b5a

Align SwissAI to v4.53.1

0ffc9b9

Init CUDA xIELU

7793c87

SwissAI*->Apertus*

590957b

ci fix

353c6c0

EduardDurech force-pushed the model/apertus branch from 1ea6373 to 353c6c0 Compare July 12, 2025 23:20

EduardDurech added 3 commits July 13, 2025 05:06

check_docstring ignore ApertusConfig

833f5fe

Licensing and placeholder tests

f0ec65c

Placeholder doc

1f4e715

EduardDurech force-pushed the model/apertus branch from 1f20c58 to 1f4e715 Compare July 13, 2025 04:36

EduardDurech added 3 commits July 13, 2025 13:16

XIELU syntax

cf12582

_xielu_python optimization

331fc0d

Fix xIELU

2728d3c

EduardDurech force-pushed the model/apertus branch from 2728d3c to b53417c Compare July 14, 2025 10:21

ArthurZucker added the New model label Jul 15, 2025

ArthurZucker reviewed Jul 15, 2025

View reviewed changes

AllenHaoHuang mentioned this pull request Aug 13, 2025

Apertus vllm-project/vllm#22810

Closed

Cyrilvallez reviewed Aug 27, 2025

View reviewed changes

tests/models/apertus/test_modeling_apertus.py Outdated Show resolved Hide resolved

tests/models/apertus/test_modeling_apertus.py Outdated Show resolved Hide resolved

EduardDurech and others added 2 commits August 27, 2025 20:32

{beta,eps} Temporarily on CPU

29da453

Suggestions

792b7de

Co-authored-by: Cyril Vallez <[email protected]>

EduardDurech force-pushed the model/apertus branch from c66e7b4 to 792b7de Compare August 27, 2025 19:38

EduardDurech and others added 7 commits August 27, 2025 21:40

Merge branch 'main' into model/apertus

c2b3de5

ci fix

a5889da

remove fx_compatible (deprecated)

e19d543

remove rotary_embedding_layer

69c46ed

As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe.

fully removing Mask4DTestHard class

864c4dd

Not needed (for now)

switch to dtype instead of torch_dtype

e7d03ad

Following this: huggingface#39782

remove unused imports

c394446

dhia680 added 2 commits August 28, 2025 00:51

remove cache_implementation="static"

68c6def

+Apertus to docs/source/en/_toctree.yml for the doc builder

227f026

Cyrilvallez approved these changes Aug 28, 2025

View reviewed changes

Cyrilvallez merged commit d10603f into huggingface:main Aug 28, 2025
20 of 22 checks passed

EduardDurech mentioned this pull request Aug 29, 2025

model: support Apertus sgl-project/sglang#9774

Merged

malte-aws mentioned this pull request Sep 1, 2025

feat: Swiss ai apertus aws-samples/sagemaker-genai-hosting-examples#112

Merged

EduardDurech mentioned this pull request Sep 1, 2025

[model] feat: Add Apertus volcengine/verl#3295

Merged

theo77186 mentioned this pull request Sep 2, 2025

Feature Request: Apertus models support ggml-org/llama.cpp#15748

Closed

4 tasks

pwilkin mentioned this pull request Sep 2, 2025

Feature Request: Aperture model support ggml-org/llama.cpp#15753

Closed

4 tasks

abenmrad mentioned this pull request Sep 2, 2025

[Model Request] Support new Apertus model ollama/ollama#12149

Open

andresnowak mentioned this pull request Sep 21, 2025

Tests: Apertus integration tests #41037

Open

Add Apertus #39381

Add Apertus #39381

Uh oh!

Conversation

EduardDurech commented Jul 12, 2025

Uh oh!

EduardDurech commented Jul 12, 2025

Uh oh!

chiffa commented Jul 14, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

EduardDurech commented Jul 15, 2025

Uh oh!

ArthurZucker commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker commented Sep 12, 2025

Uh oh!

EduardDurech commented Sep 12, 2025

Uh oh!

ArthurZucker commented Sep 12, 2025

Uh oh!

EduardDurech commented Sep 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants