-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Add Aria #34157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add Aria #34157
Changes from all commits
Commits
Show all changes
141 commits
Select commit
Hold shift + click to select a range
f37181e
First
16ab157
Try to make it work
48828e8
Working init
b663c25
First working pipeline!
8df558c
Simplify code
60ad089
Fix tests
74642ec
Small fix
aymeric-roucher 2c88807
Add GenerationMixin import
5279e43
Update doc
96a1fbf
Import sorting
d5ab4d1
Simplify by removing TokenDispatcher class
bf6ab44
Add small arg changes
f40d1cb
Simplify modular
ff5a37f
Simplify code a lot
dc29a7d
Fix tests
c711335
Simplify activation function
fc00526
Correct attention classes
c52d1de
Simplify processing
aymeric-roucher ceddfc2
Fixes
aymeric-roucher 9dd624f
Clean size conversion
aymeric-roucher 69578be
Style
aymeric-roucher 994bb0a
Fix vision attention in AriaEncoderLayer
aymeric-roucher 0188d4c
Fix tests
aymeric-roucher 3fa73e0
Merge branch 'main' into add-aria
aymeric-roucher 886237a
Fix tokenizer test
aymeric-roucher 7faf143
Change sdpa
aymeric-roucher 20babb7
Formatting
aymeric-roucher 183db61
Fix torch.empty and cuda tests
aymeric-roucher f87dd8c
Try new weights init
aymeric-roucher 3b49743
Try empty init parameters
aymeric-roucher 6e56821
Fix initialized_range
aymeric-roucher 4ecb46f
Should fix some tests
aymeric-roucher a06e425
Add num_logits_to_keep
aymeric-roucher 09a5092
Add back sdpa fix
aymeric-roucher 5630658
Not sure what I'm doing at that point
aymeric-roucher a560b26
Fix tests
aymeric-roucher d471b87
Test initialization tests
aymeric-roucher 6a8c805
Test different pad token
aymeric-roucher ae19ca6
Streamline modular_aria format
aymeric-roucher 0c8aa0a
Remove AriaVisionModel by just using Idefics3
aymeric-roucher 41a4733
Final weights
aymeric-roucher bdd7ac0
Update weight conversion script
aymeric-roucher d2bf502
Remove AriaVisionModel entirely
aymeric-roucher c42db55
Update tests with Idefics3VisionConfig
aymeric-roucher cb75cc2
Make style
aymeric-roucher 56b0a5e
Remove attention classes
aymeric-roucher 1c9fabb
Fix phantom model in configuration_auto
aymeric-roucher 82352b8
Amendment
aymeric-roucher 3e91861
Modifications following Pablo's comments
aymeric-roucher 8008228
Simplify following pablos comments
aymeric-roucher 113d4ad
Offload image processing
aymeric-roucher c82fcee
Working image processing
aymeric-roucher c658e22
Refactor function keep_ratio_resize_and_pixel_mask
aymeric-roucher 0467498
Simplify image preprocessing
aymeric-roucher 55a963a
Apply modular conversion
aymeric-roucher 7e70407
Answer comments
aymeric-roucher cdb9a7d
Integrate 2
aymeric-roucher cac130c
Protect imports
aymeric-roucher dab0b62
Adapt AriaProcessor args to common format
aymeric-roucher a5625cf
Small fix
aymeric-roucher 45d11f9
Remove _extract_kwargs
aymeric-roucher 8d2d75c
Harmonize modular and other files
aymeric-roucher 55758ef
Rename variables
aymeric-roucher 22b97bd
Rename AriaForCaualLM to AriaTextForCausalLM
aymeric-roucher cac3ca8
Try fixing FA2
aymeric-roucher 3650cdf
improve sequential gemm import
aymeric-roucher 2363b99
Formatting
aymeric-roucher 1f13198
Renaming
aymeric-roucher fb51aa6
Try fixing unprotected imports
aymeric-roucher 9a327cb
Harmonize modular with files
aymeric-roucher 586e53b
Answer comments
aymeric-roucher d782f4b
Remove legacy image input merging
aymeric-roucher acdae0b
More simplifications following comments
aymeric-roucher 0c56a9d
Remove TopKRouter
aymeric-roucher a6f75d3
Remove resize_token_embeddings
aymeric-roucher 38f1d3a
Add data_format to image processing
aymeric-roucher f158836
Add vision feature layer in config
aymeric-roucher 9451d4b
Update
aymeric-roucher 4fe6478
Format docstrings
aymeric-roucher d533357
Fix docstrings
aymeric-roucher db97796
Merge branch 'main' into add-aria
aymeric-roucher cf4bd56
Working version post merge
aymeric-roucher 0476083
Fix pretrained models
aymeric-roucher 09390c1
Harmonize files
aymeric-roucher a569c6c
Hopefully fix imports
aymeric-roucher 5276f3f
Remove dependency from processor to image processor
aymeric-roucher aa93d6b
Update dummy objects
aymeric-roucher 991ddab
Clean processor
aymeric-roucher b31fea8
Pass generation with input embeds
aymeric-roucher f8be039
Style
aymeric-roucher d56c158
Harmonize modular
aymeric-roucher ce84dcf
Try fixing weight init
aymeric-roucher 5cc3a99
Remove image token from processing
aymeric-roucher dab4d0f
Try fix imports
aymeric-roucher 1e7b83e
Try fix imports 2
aymeric-roucher 43b5f0a
Working modular
Cyrilvallez bdd6c4f
and style
Cyrilvallez 3df30fd
Repair image processing
aymeric-roucher f9d8d69
Merge remote-tracking branch 'origin/add-aria' into add-aria
aymeric-roucher e08ecf0
Style
aymeric-roucher 248aa9d
Working inference
aymeric-roucher 1ea3d17
Fix batch token counting
aymeric-roucher 9b13ef1
Improve docstrings
aymeric-roucher 6d98a0e
Add image processing tests
aymeric-roucher 265ca08
Add image processing and processing tests
aymeric-roucher a4d8a1f
Directly copy llava next functions
aymeric-roucher f30fb5b
Merge branch 'main' into add-aria
aymeric-roucher a4ce9e9
Remove chat template
aymeric-roucher 63e2276
Fix docstrings
aymeric-roucher 15f21e2
Update conversion script
aymeric-roucher e73febc
Update src/transformers/models/aria/convert_aria_weights_to_hf.py
aymeric-roucher e03a05d
Update src/transformers/models/aria/configuration_aria.py
aymeric-roucher 56942fe
Update src/transformers/models/aria/modular_aria.py
aymeric-roucher acfeb4b
Update src/transformers/models/aria/modular_aria.py
aymeric-roucher 4e6688b
Answer comments
aymeric-roucher a006f6a
Simplify more elements
aymeric-roucher d45186e
Improve projector_patch_to_query_dict max value handling
aymeric-roucher 1d924e0
Slight simplification of input type and device modification in gemm e…
aymeric-roucher cf42acc
Fix import errors
aymeric-roucher ca30b6e
Update fa2 support
aymeric-roucher 67e5dbb
Fix test
aymeric-roucher 87981b0
Add cpu back
aymeric-roucher 142e061
Improve init
aymeric-roucher 09fe137
Fix doc checks
aymeric-roucher acc9968
Soft dependencies handling
aymeric-roucher ec55502
Fix init import order
aymeric-roucher 0af60a4
Merge branch 'main' into add-aria
aymeric-roucher f529bf8
Fix experts gemm selection
aymeric-roucher 76d116b
Add idefics3 docs
aymeric-roucher ae7f5d0
Fix some docstring checks
aymeric-roucher 959702b
Fix docstrings
aymeric-roucher 461d14d
Try fix for unused config.intermediate_size
aymeric-roucher a109506
Try removing unusued config args - v2
aymeric-roucher be0e5a9
Remove moe_intermediate_size
aymeric-roucher 8a45000
Add sdpa support
aymeric-roucher 09dd7d4
Try fix docstrings
aymeric-roucher 9c3dd8a
Update the conversion script 3
aymeric-roucher 8fd065a
Final comment answer
aymeric-roucher fe62f6c
Merge branch 'main' into add-aria flaky 2
aymeric-roucher 76ee868
Fix CUDA errors 3
aymeric-roucher 956cea2
Remove duplicate init 2
aymeric-roucher File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| <!--Copyright 2024 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
|
|
||
| ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | ||
| rendered properly in your Markdown viewer. | ||
|
|
||
| --> | ||
|
|
||
| # Aria | ||
|
|
||
| ## Overview | ||
|
|
||
| The Aria model was proposed in [Aria: An Open Multimodal Native Mixture-of-Experts Model](https://huggingface.co/papers/2410.05993) by Li et al. from the Rhymes.AI team. | ||
|
|
||
| Aria is an open multimodal-native model with best-in-class performance across a wide range of multimodal, language, and coding tasks. It has a Mixture-of-Experts architecture, with respectively 3.9B and 3.5B activated parameters per visual token and text token. | ||
|
|
||
| The abstract from the paper is the following: | ||
|
|
||
| *Information comes in diverse modalities. Multimodal native AI models are essential to integrate real-world information and deliver comprehensive understanding. While proprietary multimodal native models exist, their lack of openness imposes obstacles for adoptions, let alone adaptations. To fill this gap, we introduce Aria, an open multimodal native model with best-in-class performance across a wide range of multimodal, language, and coding tasks. Aria is a mixture-of-expert model with 3.9B and 3.5B activated parameters per visual token and text token, respectively. It outperforms Pixtral-12B and Llama3.2-11B, and is competitive against the best proprietary models on various multimodal tasks. We pre-train Aria from scratch following a 4-stage pipeline, which progressively equips the model with strong capabilities in language understanding, multimodal understanding, long context window, and instruction following. We open-source the model weights along with a codebase that facilitates easy adoptions and adaptations of Aria in real-world applications.* | ||
|
|
||
| This model was contributed by [m-ric](https://huggingface.co/m-ric). | ||
| The original code can be found [here](https://github.com/rhymes-ai/Aria). | ||
|
|
||
aymeric-roucher marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ## Usage tips | ||
|
|
||
| Here's how to use the model for vision tasks: | ||
| ```python | ||
| import requests | ||
| import torch | ||
| from PIL import Image | ||
|
|
||
| from transformers import AriaProcessor, AriaForConditionalGeneration | ||
|
|
||
| model_id_or_path = "rhymes-ai/Aria" | ||
|
|
||
| model = AriaForConditionalGeneration.from_pretrained( | ||
| model_id_or_path, device_map="auto" | ||
| ) | ||
|
|
||
| processor = AriaProcessor.from_pretrained(model_id_or_path) | ||
|
|
||
| image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw) | ||
|
|
||
| messages = [ | ||
| { | ||
| "role": "user", | ||
| "content": [ | ||
| {"type": "image"}, | ||
| {"text": "what is the image?", "type": "text"}, | ||
| ], | ||
| } | ||
| ] | ||
|
|
||
| text = processor.apply_chat_template(messages, add_generation_prompt=True) | ||
| inputs = processor(text=text, images=image, return_tensors="pt") | ||
| inputs.to(model.device) | ||
|
|
||
| output = model.generate( | ||
| **inputs, | ||
| max_new_tokens=15, | ||
| stop_strings=["<|im_end|>"], | ||
aymeric-roucher marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| tokenizer=processor.tokenizer, | ||
| do_sample=True, | ||
| temperature=0.9, | ||
| ) | ||
| output_ids = output[0][inputs["input_ids"].shape[1]:] | ||
| response = processor.decode(output_ids, skip_special_tokens=True) | ||
| ``` | ||
|
|
||
|
|
||
| ## AriaImageProcessor | ||
|
|
||
| [[autodoc]] AriaImageProcessor | ||
|
|
||
| ## AriaProcessor | ||
|
|
||
| [[autodoc]] AriaProcessor | ||
|
|
||
| ## AriaTextConfig | ||
|
|
||
| [[autodoc]] AriaTextConfig | ||
|
|
||
| ## AriaConfig | ||
|
|
||
| [[autodoc]] AriaConfig | ||
|
|
||
| ## AriaTextModel | ||
|
|
||
| [[autodoc]] AriaTextModel | ||
|
|
||
| ## AriaTextForCausalLM | ||
|
|
||
| [[autodoc]] AriaTextForCausalLM | ||
|
|
||
| ## AriaForConditionalGeneration | ||
|
|
||
| [[autodoc]] AriaForConditionalGeneration | ||
| - forward | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,7 @@ | |
| albert, | ||
| align, | ||
| altclip, | ||
| aria, | ||
| audio_spectrogram_transformer, | ||
| auto, | ||
| autoformer, | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| # Copyright 2024 The HuggingFace Team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from ...utils import _LazyModule | ||
| from ...utils.import_utils import define_import_structure | ||
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| from .configuration_aria import * | ||
| from .image_processing_aria import * | ||
| from .modeling_aria import * | ||
| from .processing_aria import * | ||
|
|
||
| else: | ||
| import sys | ||
|
|
||
| _file = globals()["__file__"] | ||
| sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.