-
Notifications
You must be signed in to change notification settings - Fork 30.9k
[Bugfix] Fix reloading of pixtral/llava configs #36077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix reloading of pixtral/llava configs #36077
Conversation
Signed-off-by: Kyle Sayers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for finding the root cause. However I don't think we can solve this by just setting the flag. This flag is to be used for a bit different purposes and the naming a misleading
I think we should find why the saved/loaded config inferred incorrect LM-config type and fix it the way, so that we can infer correct lm-config. And a small test would be nice to show that composite configs can be saved/loaded with different sub-components even when attributes are shared with default values
|
Hi @zucchini-nlp!
This would likely involve having to modify the
Calls the init with no arguments, which causes transformers/src/transformers/models/llava/configuration_llava.py Lines 128 to 132 in 014047e
This is an inevitable problem because Your comment suggests modifying init_args = {key: getattr(value, "model_type", None) for key, value in self.sub_configs}
class_config_dict = self.__class__(**init_args).to_dict() if not self.is_composition else {}The only problem with this implementation is that this forces all configs to follow the particular scheme of using the |
@zucchini-nlp Could you provide a little more context as to how this flag is used? Grepping the transformers codebase reveals that this attribute is only consumed by There may be some upstream use that I am unaware of. |
|
Thanks, now I see why the model types are not matching. The flag was introduces at first for models like Musicgen, when the sub-configs had no default values set and had to be indicated by users explicitly (in #25237). transformers/src/transformers/models/musicgen/configuration_musicgen.py Lines 200 to 203 in 014047e
For Llava we didn't set it, because we have default values. But now since the same config class is used with various LM backbones, the defaults don't make much sense. I agree with the proposed fix and imo we'll need the flag to be added in all confgis that accept any Can you propagate it to all VLMs like Llava and add a small test? |
|
WIP comment: The cause of this issue is more subtle than I initially thought. There already exist mechanisms for catching diffs of non-default subconfigs that I initially missed, but they seem to not work well with computed config values.
Mistral has default values Pixtral has default values Because the default head dim values match, the diff does not register and the head_dim is not written. But upon reloading, head_dim is recomputed using the new hidden_size, which results in the 160 value. This maybe fixable by simply adding |
|
@kylesayrs yeah, that's what I thought at first but the Pixtral model checkpoint actually has |
Signed-off-by: Kyle Sayers <[email protected]>
|
@zucchini-nlp I've updated with a WIP of what a pixtral text config might look like. Let me know what you think. If this approach isn't viable, we can fall back to |
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me as a workaround for Pixtral case. But I think the is_composition will be a more generalizable solution, given that we cannot create sub-configs for every vision lm which clashed with default llama
Making LlavaConfig work with any backbone seems better to me
Signed-off-by: Kyle Sayers <[email protected]>
This reverts commit a53d5f9.
|
@zucchini-nlp I think the I still think that using |
This reverts commit 3ab1c99.
|
@zucchini-nlp I've updated the PR with a clearer explanation of the issue and links to potential follow ups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Indeed the flag here is not the best way, and we need to consider refactoring how nested configs are saved/loaded in subsequent PRs. But this will work as a solution for Pixtral
| """ | ||
| vision_config = { | ||
| "model_type": "pixtral", | ||
| "head_dim": 64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this to be 128 to trigger the same error as currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually the head_dim on the text_config that triggers the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right, my bad, didn't notice this was vision
|
cc @ArthurZucker if you want to give a look |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Will merge it, since it's not core and nice to have in the next release |
#36230) fix Co-authored-by: ydshieh <[email protected]>
* add is_composition flag to LlavaConfig Signed-off-by: Kyle Sayers <[email protected]> * WIP: pixtral text config Signed-off-by: Kyle Sayers <[email protected]> * fix style Signed-off-by: Kyle Sayers <[email protected]> * add test Signed-off-by: Kyle Sayers <[email protected]> * use is_composition for pixtral Signed-off-by: Kyle Sayers <[email protected]> * Revert "use is_composition for pixtral" This reverts commit a53d5f9. * Revert "Revert "use is_composition for pixtral"" This reverts commit 3ab1c99. --------- Signed-off-by: Kyle Sayers <[email protected]>
…ngface#36077 (huggingface#36230) fix Co-authored-by: ydshieh <[email protected]>
Purpose
head_dimon mistral-community/pixtral-12bRelated issues
Background
Many issues arise when attempting to save and load Pixtral configs, namely that the pixtral model is saved with
text_config.head_dim=128, but upon loading receives the valuetext_config.head_dim=160.Through further inspection, it was found that this was due to an issue where the head_dim
MistralConfig
Pixtral 12B
When transformers configs are saved, a diff between the saving config and the default config is computed in order to reduce verbosity. When the Pixtral 12B config saves the
head_dimattribute, it checks against theMistralConfigand finds that both values are the same (128), and therefore does not save thehead_dimto disk.However, when loading the same config using the
MistralConfigclass, the class recomputeshead_dimusing the savedhidden_size=5120(which differs from the default 4096), and returns the valuehead_dim=160. This incorrect calculation causes the model to be loaded incorrectly.Note that for the pixtral model,
head_dim != hidden_size // num_attention_heads, so this issue cannot be remedied by populating thenum_attention_headsattribute of the pixtral config.While I consider creating a dedicated
PixtralTextConfigto be the most "correct" solution which reflects the differences in assumptions made by the pixtral and mistral models, implementing a config class in transformers which does not have a unique model associated with it requires additional work and testing. This solution is backwards compatible with existing pixtral configs and does not require intensive changes to configs.Changes
is_composition=TrueforLlavaConfig, which forces all subconfig attributes to be saved to disk.head_dimis loaded and overrides the non-applicable calculation in MistralConfig\Potential Follow-ups
head_dim != hidden_size // num_attention_headsis_compositionflag to all configs with subconfigsTesting