Gemma3 #36658

RyanMullins · 2025-03-12T07:35:51Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

… not sure if RoPE is right.

…ssor

…rion time

Co-authored-by: Joshua Lochner <[email protected]>

…arbitrary number of images in prompts

…e starting with BOS

Raushan address PR comments

Co-authored-by: Pedro Cuenca <[email protected]>

github-actions · 2025-03-12T07:36:04Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

ArthurZucker

As reviewed before LGTM!

DarkLight1337 · 2025-03-12T10:09:46Z

src/transformers/models/auto/modeling_auto.py

        ("fnet", "FNetForPreTraining"),
        ("fsmt", "FSMTForConditionalGeneration"),
        ("funnel", "FunnelForPreTraining"),
+        ("gemma3", "Gemma3ForConditionalGeneration"),


I think this should be included in MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES as well? I can't load the multimodal variant using AutoModelForVision2Seq, which works for most multimodal models.

VLMs should be loaded with AutoModelForImageTextToText which is a new mapping we added for multimodal models. The old AutoModelForVision2Seq is supposed to work only for models like BLIP which are used to take bare images without instructions, and caption them

Since earlier we didn't have a specific mapping for VLMs, everything got dumped in Vision2Seq, sorry if it was confusing. New releases all will come under ImageTextToText + all older models support this mapping

I see, thanks for the explanation!

xenova and others added 30 commits March 5, 2025 16:45

Fix converter

5faeae3

[Broken] Adds Gemma 3 to Hugging Face Transformers

b21634b

Consolidating Config and Processor params across impls

8e27cd7

Sorting out configuration parameters. Adds qk_norm before RoPE. Still…

72b60b0

… not sure if RoPE is right.

Additional plumbing for CausalLM and ConditionalGeneration variants

e01056e

incomplete draft of Orbax conversion script

9a08450

More complete checkpoint conversion

a7d7fb2

Supporting Gemma 3 1B checkpoints

6d5b637

Updating RoPE for multiple frequencies

2699ec6

Adjustments to rotary embedder

49c8658

Proof of life for text-only operation

146822a

Updating the conversion script to handle multimodal projection weights

74f4acb

Fixing tet-only conversions

bfcc303

Cleaner conversion script with multimodal support and a simpler proce…

88897b2

…ssor

Additional refatcors to the Gemma3Processor

0548c26

Simplified Processor to work over text representations

1a860c7

Updated conversion script to join text and vision embeddings at conve…

f9036cd

…rion time

Logging for debugging

61f0b58

Update src/transformers/models/gemma2/modeling_gemma2.py

3f282c9

Co-authored-by: Joshua Lochner <[email protected]>

Removed extraneous Config params

8b41347

Switching to fast tokenizer for checkpoint conversions

daacc1d

isolating siglip for performance tetsing

4338957

Minor changes for debugging tests against baselines

14c443c

Adding average pooling for soft tokens

d45be31

Updating processor code to enable simpler embedding interleaving for …

cdbd03f

…arbitrary number of images in prompts

Updating conversion script for ShieldGemma 2 conversion compatibility

ec2a7df

Allow disable_compile to be provided as a kwarg

85d1181

Refresh from modular

6922438

Updated conversion script and corrected sliding window

f47afe2

Fix type mismatch in cache_position (huggingface#4)

c40f6e2

RyanMullins and others added 21 commits March 11, 2025 14:16

Inverting prior BOS change

8ab84bb

ultra nit

6dd1aef

Reverting to Tokenizer saved with add_bos_token=True and chat templat…

ae80685

…e starting with BOS

resize embeds, remove sqrt, add slow test outputs

ba77bc5

FA2 but quality is meh

aa9d141

Merge pull request huggingface#9 from huggingface/raushan-working

35ff071

Raushan address PR comments

nit

ca82ebc

skip FA2, no idea what happened

74da721

last bit for green CI

123402a

please, green CI for docs

d541fe4

T_T

1280714

Fix for Gemma3 logits

4914133

Support both options for system prompt

4c48f13

Add support for both forms of system prompts

1711942

Update src/transformers/models/gemma3/image_processing_gemma3_fast.py

5ad5b27

Co-authored-by: Pedro Cuenca <[email protected]>

Update docs/source/en/model_doc/gemma3.md

c3b0213

Co-authored-by: Pedro Cuenca <[email protected]>

Update docs/source/en/model_doc/gemma3.md

2dd948b

Co-authored-by: Pedro Cuenca <[email protected]>

Update docs/source/en/model_doc/gemma3.md

5f8f8a6

Co-authored-by: Pedro Cuenca <[email protected]>

Update docs/source/en/model_doc/gemma3.md

cd14f3f

Co-authored-by: Pedro Cuenca <[email protected]>

Update docs/source/en/model_doc/gemma3.md

782bb92

Co-authored-by: Pedro Cuenca <[email protected]>

Docs updates now that assets are live

a334121

github-actions bot marked this pull request as draft March 12, 2025 07:36

ArthurZucker approved these changes Mar 12, 2025

View reviewed changes

LysandreJik marked this pull request as ready for review March 12, 2025 07:40

Style fixes

95435e9

LysandreJik merged commit 50d3530 into huggingface:main Mar 12, 2025
17 of 21 checks passed

NanoCode012 mentioned this pull request Mar 12, 2025

Feat: Add support for gemma3_text and add e2e for gemma2 axolotl-ai-cloud/axolotl#2406

Merged

DarkLight1337 reviewed Mar 12, 2025

View reviewed changes

DarkLight1337 mentioned this pull request Mar 17, 2025

[CI/Build] Use AutoModelForImageTextToText to load image models in tests vllm-project/vllm#14945

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gemma3 #36658

Gemma3 #36658

RyanMullins commented Mar 12, 2025

Uh oh!

github-actions bot commented Mar 12, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

DarkLight1337 Mar 12, 2025 •

edited

Loading

Uh oh!

zucchini-nlp Mar 17, 2025

Uh oh!

DarkLight1337 Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Gemma3 #36658

Gemma3 #36658

Conversation

RyanMullins commented Mar 12, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Mar 12, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

DarkLight1337 Mar 12, 2025 •

edited

Loading