Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
209 commits
Select commit Hold shift + click to select a range
dc6fcac
add base convert keys + chat template
molbap Oct 1, 2024
574e01f
Merge branch 'main' into add_molmo
molbap Oct 2, 2024
0bd413b
draft: add up modular files for molmo
molbap Oct 4, 2024
9e454e4
Squashed commit of the following:
molbap Oct 8, 2024
d82c471
sync changes
molbap Oct 8, 2024
339a8d3
push a simple fix
ArthurZucker Oct 8, 2024
c0c25d6
finish fixing
ArthurZucker Oct 8, 2024
5ee6a44
Merge branch 'main' into add_molmo
molbap Oct 8, 2024
33e43ec
suppress diff
molbap Oct 8, 2024
d23e1c1
Merge branch 'main' into add_molmo
molbap Oct 10, 2024
c8c12fe
fix
ArthurZucker Oct 10, 2024
0909c02
style
ArthurZucker Oct 10, 2024
1799d20
add config + 2d pooling
molbap Oct 10, 2024
fb133d4
suppress changes
molbap Oct 10, 2024
5ba4105
Merge branch 'add_molmo' of github.com:molbap/transformers into add_m…
molbap Oct 10, 2024
a2a6a9b
fix
ArthurZucker Oct 10, 2024
8fe7a9f
Merge branch 'add_molmo' of github.com:molbap/transformers into add_m…
ArthurZucker Oct 10, 2024
20681f5
conversion works :raised_hands:
molbap Oct 11, 2024
c85af98
fixup
molbap Oct 11, 2024
35ea3cc
handle missing MOLMO_VISION_ATTENTION_CLASSES
molbap Oct 11, 2024
ab79d0e
fix
molbap Oct 11, 2024
b9bdf99
fix fused keys mismatch
molbap Oct 15, 2024
98d5ccd
fix
molbap Oct 15, 2024
3bca742
[Modular-breaking] add manually vision attention classes list
molbap Oct 15, 2024
a13fe05
finish weight conversion script
molbap Oct 15, 2024
fac8dfd
add more keys
molbap Oct 16, 2024
c1e5f19
flipped the linear layers
molbap Oct 16, 2024
a68e5f5
add pooling forward + draft general forward
molbap Oct 16, 2024
8298b80
modeling file with swiglu, forward(input_ids) passing
molbap Oct 16, 2024
9f69c6b
BIG push of image processor
molbap Oct 23, 2024
0711e08
add missing objects to init
molbap Oct 23, 2024
7efe22e
Merge branch 'main' into add_molmo
molbap Nov 5, 2024
f5bd3b0
fix up wrong channel dimension
molbap Nov 7, 2024
3ae884f
fix typo
molbap Nov 7, 2024
3ef60c0
add missing image token indices used in forward
molbap Nov 19, 2024
cf9d4ab
pad patch orderings
molbap Nov 19, 2024
91a2d3c
clean up conversion script
molbap Nov 19, 2024
0f7904f
remind that tests are TODO
molbap Nov 19, 2024
577e347
merge main
zucchini-nlp Nov 21, 2024
b514041
at least it runs like this
zucchini-nlp Nov 24, 2024
cf6cb5d
add bos token
molbap Nov 27, 2024
26c517d
add bos token in prompt
molbap Nov 27, 2024
35c168d
fix processor, missing batching img_mask
molbap Nov 27, 2024
e7275c7
fix image masks + batching
molbap Nov 27, 2024
3e7530d
working version
zucchini-nlp Nov 27, 2024
4bbc89b
+1 only on non masked indices
zucchini-nlp Nov 27, 2024
54e072b
attemp 1 to make modular work
zucchini-nlp Nov 27, 2024
1e99752
update conversion to fit all ckpt + chat template + clean up a bit
zucchini-nlp Nov 27, 2024
92a1f31
fix processing tests
zucchini-nlp Nov 27, 2024
42330e0
add more tests (failing for now)
zucchini-nlp Nov 27, 2024
932f6d1
fix the conversion
zucchini-nlp Nov 27, 2024
aafb827
done!
zucchini-nlp Nov 27, 2024
36cc6dd
nit
zucchini-nlp Nov 27, 2024
f399c3a
some tests are failing, coming back tomorrow
zucchini-nlp Nov 27, 2024
7322227
adapt to any image format
molbap Nov 27, 2024
e4db50a
Merge branch 'add_molmo' of github.com:molbap/transformers into add_m…
molbap Nov 27, 2024
205a755
try to get batched generation working
molbap Nov 28, 2024
eb61617
fix other tests, should work now
zucchini-nlp Nov 28, 2024
b77d947
adjust test for batching
zucchini-nlp Nov 28, 2024
ba4dd50
little bit of style
zucchini-nlp Nov 28, 2024
0e2d184
docs + imports + automapping
zucchini-nlp Nov 28, 2024
9a83706
remove images kwargs
zucchini-nlp Nov 28, 2024
171eb8e
some unused config attributes
zucchini-nlp Nov 28, 2024
35b517a
remove additional vocab size and pad lm head
zucchini-nlp Nov 28, 2024
6a0cbc5
remove einops dependency
molbap Nov 28, 2024
5c7b141
Merge branch 'add_molmo' of github.com:molbap/transformers into add_m…
molbap Nov 28, 2024
434d4b1
dont skip these tests
zucchini-nlp Nov 28, 2024
4645f97
format + add integration testing
molbap Nov 28, 2024
48f2e21
Merge branch 'add_molmo' of github.com:molbap/transformers into add_m…
molbap Nov 28, 2024
4bb4e48
fix tests + fix 72B conversion
molbap Nov 29, 2024
e676782
fix format
molbap Nov 29, 2024
a74bda2
modualr kinda works but adds extra classes like `VisionVisionModel` :(
zucchini-nlp Nov 29, 2024
2c428ae
accomodate 7B-O version as well (broken)
molbap Nov 29, 2024
d338153
merge, fix conflicts and clean up modular extra code
molbap Nov 29, 2024
00376c4
fix 7B-O
zucchini-nlp Dec 2, 2024
48354fe
remove unused code path
zucchini-nlp Dec 2, 2024
d738493
nit
zucchini-nlp Dec 3, 2024
d0e90d4
make modular work mostly
zucchini-nlp Dec 3, 2024
f06b6d9
fix imports
zucchini-nlp Dec 3, 2024
9fc25c0
update modulat last time
zucchini-nlp Dec 3, 2024
38dc9e8
fix copies
zucchini-nlp Dec 3, 2024
eb77f3c
fix copies
zucchini-nlp Dec 4, 2024
190cc35
fix tests
zucchini-nlp Dec 4, 2024
84ed244
initial push of fast processor
molbap Dec 6, 2024
b4d48d5
Merge branch 'add_molmo' of github.com:molbap/transformers into add_m…
molbap Dec 6, 2024
1298d08
Merge branch 'main' into add_molmo
molbap Dec 10, 2024
6687d43
fix various issues + tests
molbap Dec 10, 2024
5f79577
add Molmo submodules as private
molbap Dec 10, 2024
9e72758
do not test submodules
molbap Dec 10, 2024
439aed6
[run-slow] molmo
molbap Dec 10, 2024
5a6a965
underscore prefixed method is not public
molbap Dec 10, 2024
b9746a8
fix tests
molbap Dec 10, 2024
2090ed6
fix docs
molbap Dec 10, 2024
8ad3a25
[run-slow] molmo
molbap Dec 10, 2024
0d10ee4
Merge branch 'main' into add_molmo
molbap Dec 10, 2024
9bd96f5
fix cache shape
molbap Dec 10, 2024
af5468b
[run-slow] molmo
molbap Dec 10, 2024
c02c6de
trigger CI
molbap Dec 10, 2024
5f35055
mark flaky test
molbap Dec 10, 2024
2b7af87
add missing objects
molbap Dec 10, 2024
9f0f09d
add config to init
molbap Dec 10, 2024
74ebb24
more init fixes
molbap Dec 10, 2024
8b00c44
fix style
molbap Dec 10, 2024
d6403ad
fix?
molbap Dec 10, 2024
eb43cb9
fix
molbap Dec 10, 2024
33f0624
what is this again
molbap Dec 10, 2024
cc59007
Merge branch 'main' into add_molmo
molbap Dec 10, 2024
23ae692
is this real life
molbap Dec 10, 2024
4c456e7
it was real life, fix broken eager
molbap Dec 10, 2024
91f2820
fix attribtues
molbap Dec 10, 2024
e2df6bc
this attention should be fixed
molbap Dec 10, 2024
ae77cc6
set 7b test to bf16
molbap Dec 11, 2024
166b28a
[run-slow] molmo
molbap Dec 11, 2024
50bcb7c
Merge branch 'main' into add_molmo
molbap Dec 11, 2024
bf012d8
[run-slow] molmo
molbap Dec 11, 2024
6e0634b
fix text (variability T4/A100)
molbap Dec 11, 2024
8569fd0
push clean Fast (x3!) image processor
molbap Dec 12, 2024
fd401bc
Merge branch 'main' into add_molmo
molbap Dec 12, 2024
86acf22
fix modular changes from main
molbap Dec 12, 2024
1ebea3c
Merge branch 'main' into add_molmo
molbap Dec 16, 2024
5ebc6f0
push fast image proc with device check
molbap Dec 23, 2024
19d2689
push fast image proc with device check
molbap Dec 23, 2024
c652bb9
format
molbap Dec 23, 2024
50c21e5
images kwargs were missing
molbap Dec 23, 2024
092da76
merge and fix conflicts
molbap Dec 23, 2024
1254eac
style
molbap Dec 23, 2024
bd39143
update with modular conversion
molbap Dec 23, 2024
3efcb13
add torch import
molbap Dec 23, 2024
56ae76f
style
molbap Dec 23, 2024
9417ff7
protect import
molbap Dec 23, 2024
51f9336
fix modular
molbap Dec 23, 2024
3719481
Merge branch 'main' into add_molmo
molbap Jan 7, 2025
f394b02
cherry-pick: cohere (from 67c3fcd4f32c64e07f302f00243be7d54914d78b)
molbap Jan 8, 2025
e418aa3
fix modular with cohere interface
molbap Jan 8, 2025
5af0b57
fixup cohere all imports
molbap Jan 8, 2025
a574b93
fix bf16 test output
molbap Jan 8, 2025
9f3018d
fix
molbap Jan 8, 2025
e2d1ba8
style
molbap Jan 8, 2025
c872095
Merge branch 'main' into add_molmo
molbap Jan 9, 2025
41ab3a7
uniformize fast image processor
molbap Jan 9, 2025
dd74b78
Merge branch 'main' into add_molmo
molbap Jan 9, 2025
d052666
fix merge
molbap Jan 9, 2025
0a822f4
unbloat modular a tad
molbap Jan 9, 2025
8ebf44f
fix import
molbap Jan 9, 2025
4e6070f
fix modular
molbap Jan 9, 2025
a8758bf
remove print :eyes:
molbap Jan 10, 2025
64c2ae8
Merge branch 'main' into add_molmo
molbap Feb 10, 2025
0e69cda
call correct qk norm
molbap Feb 19, 2025
279729d
Merge branch 'main' into add_molmo
molbap Apr 7, 2025
3afdd77
remove forward last hook debug
molbap Apr 7, 2025
4df5c1a
fix qk norms, order of operations, etc
molbap Apr 9, 2025
f16e404
format
molbap Apr 9, 2025
ed891f7
fix modular
molbap Apr 9, 2025
b939817
fixup modular (some rebasing needed)
molbap Apr 9, 2025
6f480be
downstream debugger changes
molbap Apr 9, 2025
4eaff6a
likely rebase errors
molbap Apr 9, 2025
73699ea
format
molbap Apr 9, 2025
638a568
fixup modeling test
molbap Apr 10, 2025
8ff9df1
make sure to process images only when images are present
molbap Apr 10, 2025
0e97e08
fix fused qk norms
molbap Apr 10, 2025
be9b810
broken modular, qknorm was unfused in cohere
molbap Apr 10, 2025
b0213e4
Merge branch 'main' into add_molmo
molbap Apr 14, 2025
1f0fc3e
typo
molbap Apr 14, 2025
562a889
small cleanup
molbap Apr 14, 2025
61d6a4a
Merge branch 'main' into add_molmo
molbap Apr 18, 2025
fe970df
simplify molmo vision with clip refactor
molbap Apr 18, 2025
bf578e3
style
molbap Apr 18, 2025
6332eae
carried over typo after init merging
molbap Apr 18, 2025
73c8233
Merge branch 'main' into add_molmo
molbap Apr 18, 2025
15f7c05
better kv groups
molbap Apr 22, 2025
c0de7ba
refix
molbap Apr 22, 2025
87f069e
Merge branch 'main' into add_molmo
molbap Apr 22, 2025
69929a3
wrong ruff version :no_mouth:
molbap Apr 22, 2025
574e304
ruff again
molbap Apr 22, 2025
324f1be
Update docs/source/en/model_doc/molmo.md
molbap Apr 22, 2025
7d55b7a
Update docs/source/en/model_doc/molmo.md
molbap Apr 22, 2025
5ab00b3
Merge branch 'main' into add_molmo
molbap Apr 22, 2025
51b36c7
Merge branch 'add_molmo' of github.com:molbap/transformers into add_m…
molbap Apr 22, 2025
fe1e2e8
update
molbap Apr 22, 2025
c8f9553
merge issue
molbap Apr 22, 2025
ff1862e
rebase
molbap Apr 22, 2025
4770401
wrong stash pop
molbap Apr 26, 2025
caf6257
left padding, chat template, and wrong pad token
molbap Apr 26, 2025
53a5801
add docs
molbap Apr 26, 2025
fd417ae
remove debug, fix left-padded batched generation :warning_sign:
molbap Apr 28, 2025
cc91650
fixes
molbap Apr 28, 2025
fdcfadd
style
molbap Apr 28, 2025
d159c2a
fixup config
molbap Apr 28, 2025
39a78d7
woops
molbap Apr 28, 2025
025c075
clean up a bit
molbap May 12, 2025
5641b62
clean up
molbap May 12, 2025
886778b
Merge branch 'main' into add_molmo
molbap May 12, 2025
3d2f6d9
separate head from model
molbap May 12, 2025
d7f89a2
happify CI
molbap May 12, 2025
7766001
more prettifying +docs
molbap May 12, 2025
a89dbac
fixups
molbap May 12, 2025
fc9ea4f
update doc
molbap Jun 19, 2025
f642ade
remove vision2seq
molbap Jun 19, 2025
4b62b00
minor changes doc + format
molbap Jun 19, 2025
3ff333e
Merge branch 'main' into add_molmo
molbap Jun 19, 2025
dbd47b4
fixup
molbap Jun 19, 2025
6d536f6
fixes after main merge
molbap Jun 19, 2025
8b54db9
apply remainder of code review
molbap Jun 27, 2025
9f84789
Merge branch 'main' into add_molmo
molbap Oct 10, 2025
a04f709
Merge branch 'main' into add_molmo
molbap Oct 10, 2025
1da9ab5
fixup
molbap Oct 10, 2025
f697e67
blindly upstream
molbap Oct 10, 2025
6f7b6e4
update fast proc
molbap Oct 10, 2025
e1326a1
kickstart
molbap Oct 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1128,6 +1128,8 @@
title: mllama
- local: model_doc/mm-grounding-dino
title: MM Grounding DINO
- local: model_doc/molmo
title: molmo
- local: model_doc/nougat
title: Nougat
- local: model_doc/omdet-turbo
Expand Down
138 changes: 138 additions & 0 deletions docs/source/en/model_doc/molmo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# Molmo

## Overview

The Molmo model was proposed in [Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models]([https://arxiv.org/abs/2409.17146]) by Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvonne Chou, Arnavi Chheda, Jenna Sparks, Sam Skjonsberg, Michael Schmitz, Aaron Sarnat, Byron Bischoff, Pete Walsh, Chris Newell, Piper Wolters, Tanmay Gupta, Kuo-Hao Zeng, Jon Borchardt, Dirk Groeneveld, Jen Dumas, Crystal Nam, Sophie Lebrecht, Caitlin Wittlif, Carissa Schoenick, Oscar Michel, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi.

Molmo, developed by AllenAI team, is an open-source multimodal AI model capable of processing text and images within a unified framework. It outperforms larger models in efficiency and accuracy, leveraging high-quality datasets like PixMo for tasks such as captioning, question answering, and visual pointing.

The abstract from the paper is the following:

*Today's most advanced multimodal models remain proprietary. The strongest open-weight models rely heavily on synthetic data from proprietary VLMs to achieve good performance, effectively distilling these closed models into open ones. As a result, the community is still missing foundational knowledge about how to build performant VLMs from scratch. We present Molmo, a new family of VLMs that are state-of-the-art in their class of openness. Our key innovation is a novel, highly detailed image caption dataset collected entirely from human annotators using speech-based descriptions. To enable a wide array of user interactions, we also introduce a diverse dataset mixture for fine-tuning that includes in-the-wild Q&A and innovative 2D pointing data. The success of our approach relies on careful choices for the model architecture details, a well-tuned training pipeline, and, most critically, the quality of our newly collected datasets, all of which will be released. The best-in-class 72B model within the Molmo family not only outperforms others in the class of open weight and data models but also compares favorably against proprietary systems like GPT-4o, Claude 3.5, and Gemini 1.5 on both academic benchmarks and human evaluation.
*

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/molmo_arch.png"
alt="drawing" width="600"/>

<small> Molmo incorporates images by encoding various patches of the input image. Taken from the <a href="https://arxiv.org/abs/2409.17146">original paper.</a> </small>


Tips:

- We recommend calling `processor.tokenizer.padding_side = "left"` for batched generation because it leads to more accurate results.


This model was contributed by [Molbap](https://huggingface.co/Molbap).


## Usage example

### Single image inference

Here's how to load the model and perform inference in half-precision (`torch.float16`):

```python
from transformers import MolmoForConditionalGeneration, AutoProcessor
import torch
from PIL import Image
import requests

model = MolmoForConditionalGeneration.from_pretrained("allenai/Molmo-7B-D-hf", torch_dtype="float16", device_map="auto")
processor = AutoProcessor.from_pretrained("allenai/Molmo-7B-D-hf")


conversation = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://picsum.photos/id/237/536/354"},
{"type": "text", "text": "What is shown in this image?"},
],
},
]
inputs = processor.apply_chat_template(
conversation,
tokenize=True,
return_dict=True,
add_generation_prompt=True
).to(model.device)

output = model.generate(**inputs, max_new_tokens=100)

print(processor.decode(output[0], skip_special_tokens=True))
```


## MolmoConfig

[[autodoc]] MolmoConfig

## MolmoTextConfig

[[autodoc]] MolmoTextConfig

## MolmoVisionConfig

[[autodoc]] MolmoVisionConfig

## MolmoPoolingConfig

[[autodoc]] MolmoPoolingConfig

## MolmoImageProcessor

[[autodoc]] MolmoImageProcessor

## MolmoImageProcessorFast

[[autodoc]] MolmoImageProcessorFast

## MolmoProcessor

[[autodoc]] MolmoProcessor

## MolmoAdapterModel

[[autodoc]] MolmoAdapterModel
- forward

## MolmoModel

[[autodoc]] MolmoModel
- forward

## MolmoTextModel

[[autodoc]] MolmoTextModel
- forward

## MolmoVisionModel

[[autodoc]] MolmoVisionModel
- forward

## MolmoForCausalLM

[[autodoc]] MolmoForCausalLM
- forward

## MolmoForConditionalGeneration

[[autodoc]] MolmoForConditionalGeneration
- forward
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,7 @@
from .mobilevitv2 import *
from .modernbert import *
from .modernbert_decoder import *
from .molmo import *
from .moonshine import *
from .moshi import *
from .mpnet import *
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,7 @@
("mobilevitv2", "MobileViTV2Config"),
("modernbert", "ModernBertConfig"),
("modernbert-decoder", "ModernBertDecoderConfig"),
("molmo", "MolmoConfig"),
("moonshine", "MoonshineConfig"),
("moshi", "MoshiConfig"),
("mpnet", "MPNetConfig"),
Expand Down Expand Up @@ -726,6 +727,7 @@
("mobilevitv2", "MobileViTV2"),
("modernbert", "ModernBERT"),
("modernbert-decoder", "ModernBertDecoder"),
("molmo", "Molmo"),
("moonshine", "Moonshine"),
("moshi", "Moshi"),
("mpnet", "MPNet"),
Expand Down
5 changes: 3 additions & 2 deletions src/transformers/models/auto/image_processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,8 +138,9 @@
("mm-grounding-dino", ("GroundingDinoImageProcessor", "GroundingDinoImageProcessorFast")),
("mobilenet_v1", ("MobileNetV1ImageProcessor", "MobileNetV1ImageProcessorFast")),
("mobilenet_v2", ("MobileNetV2ImageProcessor", "MobileNetV2ImageProcessorFast")),
("mobilevit", ("MobileViTImageProcessor", "MobileViTImageProcessorFast")),
("mobilevitv2", ("MobileViTImageProcessor", "MobileViTImageProcessorFast")),
("mobilevit", ("MobileViTImageProcessor", None)),
("mobilevitv2", ("MobileViTImageProcessor", None)),
("molmo", ("MolmoImageProcessor", "MolmoImageProcessorFast")),
("nat", ("ViTImageProcessor", "ViTImageProcessorFast")),
("nougat", ("NougatImageProcessor", "NougatImageProcessorFast")),
("oneformer", ("OneFormerImageProcessor", "OneFormerImageProcessorFast")),
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
("mobilevitv2", "MobileViTV2Model"),
("modernbert", "ModernBertModel"),
("modernbert-decoder", "ModernBertDecoderModel"),
("molmo", "MolmoModel"),
("moonshine", "MoonshineModel"),
("moshi", "MoshiModel"),
("mpnet", "MPNetModel"),
Expand Down Expand Up @@ -713,6 +714,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
("mixtral", "MixtralForCausalLM"),
("mllama", "MllamaForCausalLM"),
("modernbert-decoder", "ModernBertDecoderForCausalLM"),
("molmo", "MolmoForCausalLM"),
("moshi", "MoshiForCausalLM"),
("mpt", "MptForCausalLM"),
("musicgen", "MusicgenForCausalLM"),
Expand Down Expand Up @@ -1046,6 +1048,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
("llava_onevision", "LlavaOnevisionForConditionalGeneration"),
("mistral3", "Mistral3ForConditionalGeneration"),
("mllama", "MllamaForConditionalGeneration"),
("molmo", "MolmoForConditionalGeneration"),
("ovis2", "Ovis2ForConditionalGeneration"),
("paligemma", "PaliGemmaForConditionalGeneration"),
("perception_lm", "PerceptionLMForConditionalGeneration"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@
("mistral3", "PixtralProcessor"),
("mllama", "MllamaProcessor"),
("mm-grounding-dino", "GroundingDinoProcessor"),
("molmo", "MolmoProcessor"),
("moonshine", "Wav2Vec2Processor"),
("oneformer", "OneFormerProcessor"),
("ovis2", "Ovis2Processor"),
Expand Down
30 changes: 30 additions & 0 deletions src/transformers/models/molmo/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import _LazyModule
from ...utils.import_utils import define_import_structure


if TYPE_CHECKING:
from .configuration_molmo import *
from .image_processing_molmo import *
from .image_processing_molmo_fast import *
from .modeling_molmo import *
from .processing_molmo import *
else:
import sys

_file = globals()["__file__"]
sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
Loading