Skip to content
Merged

Gemma3 #36658

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
126 commits
Select commit Hold shift + click to select a range
5faeae3
Fix converter
xenova Feb 28, 2025
b21634b
[Broken] Adds Gemma 3 to Hugging Face Transformers
RyanMullins Feb 12, 2025
8e27cd7
Consolidating Config and Processor params across impls
RyanMullins Feb 14, 2025
72b60b0
Sorting out configuration parameters. Adds qk_norm before RoPE. Still…
RyanMullins Feb 15, 2025
e01056e
Additional plumbing for CausalLM and ConditionalGeneration variants
RyanMullins Feb 17, 2025
9a08450
incomplete draft of Orbax conversion script
RyanMullins Feb 17, 2025
a7d7fb2
More complete checkpoint conversion
RyanMullins Feb 19, 2025
6d5b637
Supporting Gemma 3 1B checkpoints
RyanMullins Feb 19, 2025
2699ec6
Updating RoPE for multiple frequencies
RyanMullins Feb 20, 2025
49c8658
Adjustments to rotary embedder
RyanMullins Feb 20, 2025
146822a
Proof of life for text-only operation
RyanMullins Feb 20, 2025
74f4acb
Updating the conversion script to handle multimodal projection weights
RyanMullins Feb 26, 2025
bfcc303
Fixing tet-only conversions
RyanMullins Feb 26, 2025
88897b2
Cleaner conversion script with multimodal support and a simpler proce…
RyanMullins Feb 26, 2025
0548c26
Additional refatcors to the Gemma3Processor
RyanMullins Feb 27, 2025
1a860c7
Simplified Processor to work over text representations
RyanMullins Feb 27, 2025
f9036cd
Updated conversion script to join text and vision embeddings at conve…
RyanMullins Feb 27, 2025
61f0b58
Logging for debugging
RyanMullins Feb 27, 2025
3f282c9
Update src/transformers/models/gemma2/modeling_gemma2.py
RyanMullins Feb 28, 2025
8b41347
Removed extraneous Config params
RyanMullins Feb 28, 2025
daacc1d
Switching to fast tokenizer for checkpoint conversions
RyanMullins Feb 28, 2025
4338957
isolating siglip for performance tetsing
RyanMullins Feb 28, 2025
14c443c
Minor changes for debugging tests against baselines
RyanMullins Feb 28, 2025
d45be31
Adding average pooling for soft tokens
RyanMullins Mar 2, 2025
cdbd03f
Updating processor code to enable simpler embedding interleaving for …
RyanMullins Mar 3, 2025
ec2a7df
Updating conversion script for ShieldGemma 2 conversion compatibility
RyanMullins Mar 3, 2025
85d1181
Allow disable_compile to be provided as a kwarg
pcuenca Mar 4, 2025
6922438
Refresh from modular
pcuenca Mar 4, 2025
f47afe2
Updated conversion script and corrected sliding window
RyanMullins Mar 4, 2025
c40f6e2
Fix type mismatch in cache_position (#4)
pcuenca Mar 5, 2025
5ebdcb8
Fix dtype (#5)
pcuenca Mar 5, 2025
432c645
fixes for embedding table overflow and missing image_soft_token_mask …
RyanMullins Mar 5, 2025
65350cf
Adding 2D pooling for image embeddings
MayankChaturvedi Mar 5, 2025
00af9a7
Revert "Adding 2D pooling for image embeddings"
MayankChaturvedi Mar 5, 2025
1a36187
Gemma3 average pooling changed from 1D to 2D
MayankChaturvedi Mar 5, 2025
88030d1
Merge pull request #8 from RyanMullins/gemma3pooling
RyanMullins Mar 6, 2025
e23b2ba
Major refactor to Gemma3MultimodalInputProjection
RyanMullins Mar 6, 2025
6670e1b
Updating Gemm 3 Auto* registrations
RyanMullins Mar 6, 2025
7907bf0
Add option to save Gemma 3 chat template with tokenizer during weight…
RyanMullins Mar 6, 2025
6d0dd5a
Removing unused imports
RyanMullins Mar 6, 2025
c042cd0
Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditi…
RyanMullins Mar 6, 2025
10a6185
Removing duplicate config property
RyanMullins Mar 6, 2025
21fc682
Removing final logit softcapping and 1-indexing of position ids
RyanMullins Mar 6, 2025
fa28a8c
Fixing image processor config and none --> None typo
RyanMullins Mar 6, 2025
0f148d1
Fixing sliding window size for 1B
RyanMullins Mar 6, 2025
48bca47
Updating image_mean and image_std in Image Processor
RyanMullins Mar 6, 2025
576f065
Attention masking changed to lower triangular
MayankChaturvedi Mar 6, 2025
f137065
Merge pull request #9 from RyanMullins/gemma3attention
RyanMullins Mar 6, 2025
e9e41bb
Moving image special tokens to conversion script
RyanMullins Mar 6, 2025
ed3813d
Mirror image processor defaults from conversion script into Gemma3Pro…
RyanMullins Mar 6, 2025
f25309c
Remove special token variables from symbol space
RyanMullins Mar 6, 2025
2ad61ba
Moving image soft token mask computation from Gemma3Processor to Gemm…
RyanMullins Mar 6, 2025
a45b01c
tie lm_head and embedding weights
RyanMullins Mar 6, 2025
dae9277
Correct tied weights in Gemma3CausalLM
RyanMullins Mar 6, 2025
c5f8446
iterative bidirectional attention
MayankChaturvedi Mar 7, 2025
c7c8468
resolving merge conflicts
MayankChaturvedi Mar 7, 2025
a7cb4af
Reverting to Gemma 2 HybridCache with sldiing window support and a sl…
RyanMullins Mar 7, 2025
9bb66a2
Correcting RoPE scaling
RyanMullins Mar 7, 2025
bd5b5e5
clean up first pass, dummy model geenration works
zucchini-nlp Mar 7, 2025
e1d448c
final clean up before fixing tests
zucchini-nlp Mar 7, 2025
4b9e8b4
causal lm test works, so fine
zucchini-nlp Mar 7, 2025
ee837ca
Fix conversion
pcuenca Mar 7, 2025
875c104
Update src/transformers/models/gemma3/processing_gemma3.py
pcuenca Mar 7, 2025
536d5b8
Merge remote-tracking branch 'origin/gemma3' into gemma3-convert
pcuenca Mar 8, 2025
ae6f71d
model tests are happy
zucchini-nlp Mar 8, 2025
de52bb5
processor tests are happy
zucchini-nlp Mar 8, 2025
d0e0b00
image processing tests added
zucchini-nlp Mar 8, 2025
240c695
fixup
zucchini-nlp Mar 8, 2025
4269332
Fix pre-processing in conversion
pcuenca Mar 8, 2025
b89faaf
Inputs merging
pcuenca Mar 8, 2025
21f15c1
Do not normalize vision embeddings
pcuenca Mar 8, 2025
abde03a
Apply Ryan's (and team) changes to attention
pcuenca Mar 8, 2025
613ccb3
token type ids + mask
zucchini-nlp Mar 10, 2025
0c5f50c
Merge branch 'gemma3-convert' into gemma3
zucchini-nlp Mar 10, 2025
f6f07d7
template
zucchini-nlp Mar 10, 2025
50e1799
Merge remote-tracking branch 'upstream/main' into gemma3
zucchini-nlp Mar 10, 2025
0d91458
move embed scale, add rope scale, fix tests
zucchini-nlp Mar 10, 2025
f19907c
Add chat template to tokenizer
pcuenca Mar 10, 2025
daf6fea
Use prefix for causal model loading
pcuenca Mar 10, 2025
b03ef67
use existing code for sliding mask from gemma2
zucchini-nlp Mar 10, 2025
402c7af
Merge remote-tracking branch 'origin/gemma3' into multimodals-are-causal
pcuenca Mar 10, 2025
d36921d
self.embed_tokens already normalizes
pcuenca Mar 10, 2025
b089958
Correcting Gemma3TextConfig parameters in conversion script
RyanMullins Mar 10, 2025
54ebbb7
typo, modular overwrites my fixes
zucchini-nlp Mar 10, 2025
50492ba
Merge branch 'gemma3' into multimodals-are-causal
pcuenca Mar 10, 2025
a99de0c
enable device map for text model
zucchini-nlp Mar 10, 2025
f71762f
Conversion updates
pcuenca Mar 10, 2025
e2c50bc
Merge pull request #7 from huggingface/multimodals-are-causal
pcuenca Mar 10, 2025
e9f46fd
ultra nit: no einsums
zucchini-nlp Mar 10, 2025
42b7a0a
update image token
zucchini-nlp Mar 10, 2025
d542591
copy deepcopy config + some docs
zucchini-nlp Mar 10, 2025
faecbac
add some test, still WIP
zucchini-nlp Mar 10, 2025
de4ae31
Refactoring --include_chat_tempalte logic in converter
RyanMullins Mar 10, 2025
03ea332
Update src/transformers/models/gemma3/modular_gemma3.py
zucchini-nlp Mar 11, 2025
6ed3b7d
Add eos tokens for instruct models
pcuenca Mar 11, 2025
d9b6541
Merge pull request #8 from huggingface/convert-with-eos
pcuenca Mar 11, 2025
a407829
dump so i can work on dgx
zucchini-nlp Mar 11, 2025
1436ae8
Removing add_bos by default
RyanMullins Mar 11, 2025
69f3748
dump
zucchini-nlp Mar 11, 2025
fbd8a27
add fast im proc
zucchini-nlp Mar 11, 2025
af8081b
docs for PaS + fixup
zucchini-nlp Mar 11, 2025
2190484
another fixup
zucchini-nlp Mar 11, 2025
49524d2
one more fixup
zucchini-nlp Mar 11, 2025
1c57c1e
fix tests
zucchini-nlp Mar 11, 2025
8ab84bb
Inverting prior BOS change
RyanMullins Mar 11, 2025
6dd1aef
ultra nit
zucchini-nlp Mar 11, 2025
ae80685
Reverting to Tokenizer saved with add_bos_token=True and chat templat…
RyanMullins Mar 11, 2025
ba77bc5
resize embeds, remove sqrt, add slow test outputs
zucchini-nlp Mar 11, 2025
aa9d141
FA2 but quality is meh
zucchini-nlp Mar 11, 2025
35ff071
Merge pull request #9 from huggingface/raushan-working
zucchini-nlp Mar 11, 2025
ca82ebc
nit
zucchini-nlp Mar 11, 2025
74da721
skip FA2, no idea what happened
zucchini-nlp Mar 11, 2025
123402a
last bit for green CI
zucchini-nlp Mar 11, 2025
d541fe4
please, green CI for docs
zucchini-nlp Mar 11, 2025
1280714
T_T
zucchini-nlp Mar 11, 2025
4914133
Fix for Gemma3 logits
RyanMullins Mar 12, 2025
4c48f13
Support both options for system prompt
xenova Mar 12, 2025
1711942
Add support for both forms of system prompts
xenova Mar 12, 2025
5ad5b27
Update src/transformers/models/gemma3/image_processing_gemma3_fast.py
RyanMullins Mar 12, 2025
c3b0213
Update docs/source/en/model_doc/gemma3.md
RyanMullins Mar 12, 2025
2dd948b
Update docs/source/en/model_doc/gemma3.md
RyanMullins Mar 12, 2025
5f8f8a6
Update docs/source/en/model_doc/gemma3.md
RyanMullins Mar 12, 2025
cd14f3f
Update docs/source/en/model_doc/gemma3.md
RyanMullins Mar 12, 2025
782bb92
Update docs/source/en/model_doc/gemma3.md
RyanMullins Mar 12, 2025
a334121
Docs updates now that assets are live
RyanMullins Mar 12, 2025
95435e9
Style fixes
LysandreJik Mar 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
203 changes: 203 additions & 0 deletions docs/source/en/model_doc/gemma3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@

<!--Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# Gemma3

## Overview

The Gemma 3 model was proposed in the [Gemma 3 Techncial Report](https://goo.gle/Gemma3Report) by Google. It is a vision-language model composed by a [SigLIP](siglip) vision encoder and a [Gemma 2](gemma_2) language decoder, linked by a multimodal linear projection. It cuts an image into a fixed number of tokens, in the same way as SigLIP, as long as the image does not exceed certain aspect ratio. For images that exceed the given aspect ratio, it crops the image into multiple smaller patches and concatenates them with the base image embedding. One particularity is that the model uses bidirectional attention on all the image tokens. In addition, the model interleaves sliding window local attention with full causal attention in the language backbone, where each sixth layer is a full causal attention layer.

This model was contributed by [Ryan Mullins](https://huggingface.co/RyanMullins), [Raushan Turganbay](https://huggingface.co/RaushanTurganbay) [Arthur Zucker](https://huggingface.co/ArthurZ), and [Pedro Cuenca](https://huggingface.co/pcuenq).


## Usage tips


- For image+text and image-only inputs use `Gemma3ForConditionalGeneration`.
- For text-only inputs use `Gemma3ForCausalLM` for generation to avoid loading the vision tower.
- Each sample can contain multiple images, and the number of images can vary between samples. However, make sure to pass correctly batched images to the processor, where each batch is a list of one or more images.
- The text passed to the processor should have a `<start_of_image>` token wherever an image should be inserted.
- The processor has its own `apply_chat_template` method to convert chat messages to model inputs. See the examples below for more details on how to use it.


### Image cropping for high resolution images

The model supports cropping images into smaller patches when the image aspect ratio exceeds a certain value. By default the images are not cropped and only the base image is forwarded to the model. Users can set `do_pan_and_scan=True` to obtain several crops per image along with the base image to improve the quality in DocVQA or similar tasks requiring higher resolution images.

Pan and scan is an inference time optimization to handle images with skewed aspect ratios. When enabled, it improves performance on tasks related to document understanding, infographics, OCR, etc.

```python

processor = AutoProcessor.from_pretrained("google/gemma-3-4b-it", padding_side="left")

url = "https://media.istockphoto.com/id/1192867753/photo/cow-in-berchida-beach-siniscola.jpg?s=612x612&w=0&k=20&c=v0hjjniwsMNfJSuKWZuIn8pssmD5h5bSN1peBd1CmH4="
messages = [
{
"role": "system",
"content": [
{"type": "text", "text": "You are a helpful assistant."}
]
},
{
"role": "user", "content": [
{"type": "image", "url": url},
{"type": "text", "text": "What is shown in this image?"},
]
},
]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
return_dict=True,
return_tensors="pt",
add_generation_prompt=True,
do_pan_and_scan=True,
).to(model.device)

```


## Usage Example

### Single-image Inference

```python
from transformers import AutoProcessor, Gemma3ForConditionalGeneration

model_id = "google/gemma-3-4b-it"
model = Gemma3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")
processor = AutoProcessor.from_pretrained(model_id, padding_side="left")

url = "https://media.istockphoto.com/id/1192867753/photo/cow-in-berchida-beach-siniscola.jpg?s=612x612&w=0&k=20&c=v0hjjniwsMNfJSuKWZuIn8pssmD5h5bSN1peBd1CmH4="
messages = [
{
"role": "system",
"content": [
{"type": "text", "text": "You are a helpful assistant."}
]
},
{
"role": "user", "content": [
{"type": "image", "url": url},
{"type": "text", "text": "What is shown in this image?"},
]
},
]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
return_dict=True,
return_tensors="pt",
add_generation_prompt=True,
).to(model.device)

output = model.generate(**inputs, max_new_tokens=50)
print(processor.decode(output[0], skip_special_tokens=True)[inputs.input_ids.shape[1]: ])
```

### Multi-image Inference

```python
model_id = "google/gemma-3-4b-it"
model = Gemma3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")
processor = AutoProcessor.from_pretrained(model_id, padding_side="left")

url_cow = "https://media.istockphoto.com/id/1192867753/photo/cow-in-berchida-beach-siniscola.jpg?s=612x612&w=0&k=20&c=v0hjjniwsMNfJSuKWZuIn8pssmD5h5bSN1peBd1CmH4="
url_stop = "https://www.ilankelman.org/stopsigns/australia.jpg"
messages = [
{
"role": "system",
"content": [
{"type": "text", "text": "You are a helpful assistant."}
]
},
{
"role": "user", "content": [
{"type": "image", "url": url_cow},
{"type": "image", "url": url_stop},
{"type": "text", "text": "Are these two images identical?"},
]
},
]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
return_dict=True,
return_tensors="pt",
add_generation_prompt=True,
).to(model.device)

output = model.generate(**inputs, max_new_tokens=50)
print(processor.decode(output[0], skip_special_tokens=True)[inputs.input_ids.shape[1]: ])

```

### Text-only inference

You can use the VLMs for text-only generation by omitting images in your input. However, you can also load the models in text-only mode as shown below. This will skip loading the vision tower and will save resources when you just need the LLM capabilities.
```python
from transformers import AutoTokenizer, Gemma3ForCausalLM

model_id = "google/gemma-3-1b-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = Gemma3ForCausalLM.from_pretrained(model_id, device_map="auto")

input_ids = tokenizer("Write me a poem about Machine Learning.", return_tensors="pt").to(model.device)

outputs = model.generate(**input_ids, max_new_tokens=100)
text = tokenizer.batch_decode(outputs, skip_special_tokens=True)

print(text)

```


## Gemma3ImageProcessor

[[autodoc]] Gemma3ImageProcessor

## Gemma3ImageProcessorFast

[[autodoc]] Gemma3ImageProcessorFast

## Gemma3Processor

[[autodoc]] Gemma3Processor

## Gemma3TextConfig

[[autodoc]] Gemma3TextConfig

## Gemma3Config

[[autodoc]] Gemma3Config

## Gemma3TextModel

[[autodoc]] Gemma3TextModel
- forward

## Gemma3ForCausalLM

[[autodoc]] Gemma3ForCausalLM
- forward

## Gemma3ForConditionalGeneration

[[autodoc]] Gemma3ForConditionalGeneration
- forward
22 changes: 21 additions & 1 deletion src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,7 @@
"models.fuyu": ["FuyuConfig"],
"models.gemma": ["GemmaConfig"],
"models.gemma2": ["Gemma2Config"],
"models.gemma3": ["Gemma3Config", "Gemma3Processor", "Gemma3TextConfig"],
"models.git": [
"GitConfig",
"GitProcessor",
Expand Down Expand Up @@ -1259,6 +1260,7 @@
_import_structure["models.emu3"].append("Emu3ImageProcessor")
_import_structure["models.flava"].extend(["FlavaFeatureExtractor", "FlavaImageProcessor", "FlavaProcessor"])
_import_structure["models.fuyu"].extend(["FuyuImageProcessor", "FuyuProcessor"])
_import_structure["models.gemma3"].append("Gemma3ImageProcessor")
_import_structure["models.glpn"].extend(["GLPNFeatureExtractor", "GLPNImageProcessor"])
_import_structure["models.got_ocr2"].extend(["GotOcr2ImageProcessor"])
_import_structure["models.grounding_dino"].extend(["GroundingDinoImageProcessor"])
Expand Down Expand Up @@ -1332,6 +1334,7 @@
_import_structure["models.deit"].append("DeiTImageProcessorFast")
_import_structure["models.depth_pro"].append("DepthProImageProcessorFast")
_import_structure["models.detr"].append("DetrImageProcessorFast")
_import_structure["models.gemma3"].append("Gemma3ImageProcessorFast")
_import_structure["models.got_ocr2"].append("GotOcr2ImageProcessorFast")
_import_structure["models.llava"].append("LlavaImageProcessorFast")
_import_structure["models.llava_next"].append("LlavaNextImageProcessorFast")
Expand Down Expand Up @@ -2452,6 +2455,14 @@
"Gemma2PreTrainedModel",
]
)
_import_structure["models.gemma3"].extend(
[
"Gemma3ForCausalLM",
"Gemma3ForConditionalGeneration",
"Gemma3PreTrainedModel",
"Gemma3TextModel",
]
)
_import_structure["models.git"].extend(
[
"GitForCausalLM",
Expand Down Expand Up @@ -2554,14 +2565,14 @@
"GraniteMoePreTrainedModel",
]
)

_import_structure["models.granitemoeshared"].extend(
[
"GraniteMoeSharedForCausalLM",
"GraniteMoeSharedModel",
"GraniteMoeSharedPreTrainedModel",
]
)

_import_structure["models.grounding_dino"].extend(
[
"GroundingDinoForObjectDetection",
Expand Down Expand Up @@ -5629,6 +5640,7 @@
from .models.fuyu import FuyuConfig
from .models.gemma import GemmaConfig
from .models.gemma2 import Gemma2Config
from .models.gemma3 import Gemma3Config, Gemma3Processor, Gemma3TextConfig
from .models.git import (
GitConfig,
GitProcessor,
Expand Down Expand Up @@ -6450,6 +6462,7 @@
FlavaProcessor,
)
from .models.fuyu import FuyuImageProcessor, FuyuProcessor
from .models.gemma3 import Gemma3ImageProcessor
from .models.glpn import GLPNFeatureExtractor, GLPNImageProcessor
from .models.got_ocr2 import GotOcr2ImageProcessor
from .models.grounding_dino import GroundingDinoImageProcessor
Expand Down Expand Up @@ -6535,6 +6548,7 @@
from .models.deit import DeiTImageProcessorFast
from .models.depth_pro import DepthProImageProcessorFast
from .models.detr import DetrImageProcessorFast
from .models.gemma3 import Gemma3ImageProcessorFast
from .models.got_ocr2 import GotOcr2ImageProcessorFast
from .models.llava import LlavaImageProcessorFast
from .models.llava_next import LlavaNextImageProcessorFast
Expand Down Expand Up @@ -7461,6 +7475,12 @@
Gemma2Model,
Gemma2PreTrainedModel,
)
from .models.gemma3 import (
Gemma3ForCausalLM,
Gemma3ForConditionalGeneration,
Gemma3PreTrainedModel,
Gemma3TextModel,
)
from .models.git import (
GitForCausalLM,
GitModel,
Expand Down
20 changes: 11 additions & 9 deletions src/transformers/convert_slow_tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,10 +113,10 @@ def extract(self, vocab_scores=None) -> Tuple[Dict[str, int], List[Tuple]]:
sp = self.sp
vocab = {sp.id_to_piece(index): index for index in range(sp.GetPieceSize())}

# there is a missing token in the vocab. We have to do this to support merges
# If "\t" is missing in the vocab, we have to do this to support merges
# "<0x09>" is the bytefallback for `\t`
vocab["\t"] = vocab.get("<0x09>")

if "\t" not in vocab:
vocab["\t"] = vocab.get("<0x09>")
merges = generate_merges(vocab, vocab_scores)
return vocab, merges

Expand Down Expand Up @@ -1296,12 +1296,14 @@ def vocab(self, proto):
(self.original_tokenizer.eos_token, 0.0),
(self.original_tokenizer.bos_token, 0.0),
]
for piece in proto.pieces[3:]:
if piece.piece == "<0x09>":
vocab += [("\t", piece.score)]
else:
vocab += [(piece.piece, piece.score)]
# vocab += [(piece.piece, piece.score) for piece in proto.pieces[3:]]
vocab += [(piece.piece, piece.score) for piece in proto.pieces[3:]]

# Older gemma tokenizers had a missing tab token, so we fix that here
if not any(x[0] == "\t" for x in vocab):
override_index = next((i for i, x in enumerate(vocab) if x[0] == "<0x09>"), None)
if override_index is not None:
vocab[override_index] = ("\t", 0.0)

return vocab

def pre_tokenizer(self, replacement, add_prefix_space):
Expand Down
6 changes: 3 additions & 3 deletions src/transformers/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -821,13 +821,13 @@ def _load_state_dict_into_meta_model(
is_torch_e4m3fn_available = hasattr(torch, "float8_e4m3fn")

for serialized_param_name, empty_param in state_dict.items():
if serialized_param_name not in expected_keys:
continue

# serialized_param_name is the raw, serialized name
# fixed_param_name is the model's equivalent
fixed_param_name, _ = model.rename_key(serialized_param_name)

if fixed_param_name not in expected_keys:
continue

# we need to use serialized_param_name as file pointer is untouched
param = (
file_pointer.get_slice(serialized_param_name)
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@
fuyu,
gemma,
gemma2,
gemma3,
git,
glm,
glpn,
Expand Down
5 changes: 5 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,8 @@
("fuyu", "FuyuConfig"),
("gemma", "GemmaConfig"),
("gemma2", "Gemma2Config"),
("gemma3", "Gemma3Config"),
("gemma3_text", "Gemma3TextConfig"),
("git", "GitConfig"),
("glm", "GlmConfig"),
("glpn", "GLPNConfig"),
Expand Down Expand Up @@ -459,6 +461,8 @@
("fuyu", "Fuyu"),
("gemma", "Gemma"),
("gemma2", "Gemma2"),
("gemma3", "Gemma3ForConditionalGeneration"),
("gemma3_text", "Gemma3ForCausalLM"),
("git", "GIT"),
("glm", "GLM"),
("glpn", "GLPN"),
Expand Down Expand Up @@ -748,6 +752,7 @@
("qwen2_audio_encoder", "qwen2_audio"),
("clip_text_model", "clip"),
("aria_text", "aria"),
("gemma3_text", "gemma3"),
("idefics3_vision", "idefics3"),
("siglip_vision_model", "siglip"),
("smolvlm_vision", "smolvlm"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/image_processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@
("flava", ("FlavaImageProcessor",)),
("focalnet", ("BitImageProcessor",)),
("fuyu", ("FuyuImageProcessor",)),
("gemma3", ("Gemma3ImageProcessor", "Gemma3ImageProcessorFast")),
("git", ("CLIPImageProcessor", "CLIPImageProcessorFast")),
("glpn", ("GLPNImageProcessor",)),
("got_ocr2", ("GotOcr2ImageProcessor", "GotOcr2ImageProcessorFast")),
Expand Down
Loading