[`FA2`] Cast to correct dtype #26560

younesbelkada · 2023-10-03T11:40:34Z

What does this PR do?

Currently performing bf16 fine-tuning with FA-2 leads to hidden states silently being casted in float16
As it is challenging to retrieve the original dtype of the model in case the model is quantized, I propose to store that dtype in a private attribute to be able to retrieve it conveniently without having to perform any sort of hack that gets the correct dtype if the model is quantized

ArthurZucker

A small nit!

src/transformers/models/falcon/modeling_falcon.py

HuggingFaceDocBuilderDev · 2023-10-03T12:04:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

ArthurZucker

Okay! LGTM but the issue is that the dtype could/might change when we do model.to(device) meaning this only fixes inference after init if torch_dtype is specified

src/transformers/modeling_utils.py

Co-authored-by: Arthur <[email protected]>

ArthurZucker

Changes to the modeling utils are a bit too specific (specifically the to) to flash attention. Given that torch_dtype can be accessed in the XXXFlashAttention as self.config makes more sense to have this in the attention (if possible?).

It's good if we want this, but there might be a solution to just change the attention

ArthurZucker · 2023-10-03T12:57:39Z

src/transformers/modeling_utils.py

+                else:
+                    target_dtype = kwargs["dtype"]
+
+                if target_dtype is not None and target_dtype == torch.float32:


younesbelkada · 2023-10-03T13:15:09Z

I agree we should make all hacks / changes with respect to FA2 modules inside them. However this might introduce multiple patches and other hacks for quantized modules. I think for now this approach is fine, but I agree we should go for a better one, as this would unblock some users for the next release, I left it as a TODO!

MrTimmy89 · 2023-10-03T21:42:00Z

src/transformers/models/mistral/modeling_mistral.py

-                " float16."
+                f"The input hidden states seems to be silently casted in float32, this might be related to"
+                f" the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in"
+                f" {attention_dtype}. Make sure to pass the desired dtype when calling `from_pretrained` with `torch_dtype=your_dtype`"


Seems to me, that this line is 124 long and should be split into 2, previous lines could be left without 'f' at the beginning, while there are no values to be formatted there

younesbelkada · 2023-10-06T16:25:39Z

cc @hiyouga are you able to Fine-tune in bf16 with this branch?

LysandreJik

Ok, cool to relax the cast to fp16.

Wouldn't it be cleaner to put the changes relative to to in a FlashAttention-specific mixin so as to not require changes to modeling_utils.py? You wouldn't need to add an additional private property _flash_attn_2_attention_dtype and wouldn't need to edit the general to method either (just the Flash-attention-specific to method.

You may want to update the _apply method instead of to however, I think with to you're not seeing calls like model.float() which will convert your entire model to float32.

And probably an edge case but with a class-level to override like this you're not keeping the following in check:

model = MistralForCausalLM.from_pretrained("hf-internal-testing/tiny-random-MistralModel", use_flash_attention_2=True, torch_dtype=torch.float16)
model.model.layers.to(torch.float32)

LysandreJik · 2023-10-11T09:31:32Z

src/transformers/modeling_utils.py

+                    raise ValueError(
+                        "You cannot cast a model that has been loaded with Flash Attention 2 in `float32`"
+                    )


Maybe mention how to go around this?

LysandreJik · 2023-10-11T09:31:53Z

src/transformers/modeling_utils.py

+            # TODO: @younesbelkada find a better way to do this directly in `xxxFlashAttention` modules
+            # currently it is not possible to retrieve the original dtype for quantized models.


Should this be investigated now?

hiyouga · 2023-10-11T12:29:42Z

I also thought that we could retrieve the data type from self.config.torch_dtype of LlamaFlashAttention2.

PS. Although it may fail in such an edge case as Lysandre said, we usually have a consistent data type in training.

patrickvonplaten · 2023-10-11T18:15:25Z

src/transformers/models/llama/modeling_llama.py

        # in fp32. (LlamaRMSNorm handles it correctly)
        input_dtype = query_states.dtype
        if input_dtype == torch.float32:
+            attention_dtype = self.config._flash_attn_2_attention_dtype


Not all attention modules have a config variable. But guess that's ok we can just pass it forward.

younesbelkada · 2023-10-16T18:48:28Z

Closing this PR in favor of #26846

younesbelkada added 3 commits October 3, 2023 11:36

cast to correct dtype

fba4e7f

Merge remote-tracking branch 'upstream/main' into fix-fa-2-bf16

d251825

do it for mistral

48126ea

ArthurZucker reviewed Oct 3, 2023

View reviewed changes

src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved

nit

a2c1e4a

younesbelkada requested a review from ArthurZucker October 3, 2023 12:09

ArthurZucker reviewed Oct 3, 2023

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

younesbelkada and others added 3 commits October 3, 2023 14:47

deal with corner case and added tests

53c5f30

fixup

b4777ca

Update src/transformers/modeling_utils.py

5539b25

Co-authored-by: Arthur <[email protected]>

younesbelkada requested a review from ArthurZucker October 3, 2023 12:52

ArthurZucker reviewed Oct 3, 2023

View reviewed changes

add comment

e03b218

MrTimmy89 reviewed Oct 3, 2023

View reviewed changes

ArthurZucker mentioned this pull request Oct 11, 2023

[WIP] Add FA2 for all Bart-like #26722

Closed

LysandreJik reviewed Oct 11, 2023

View reviewed changes

patrickvonplaten reviewed Oct 11, 2023

View reviewed changes

This was referenced Oct 12, 2023

🚨🚨🚨 [Quantization] Store the original dtype in the config as a private attribute 🚨🚨🚨 #26761

Merged

[FA-2] Final fix for FA2 dtype #26846

Merged

younesbelkada closed this Oct 16, 2023

		# TODO: @younesbelkada find a better way to do this directly in `xxxFlashAttention` modules
		# currently it is not possible to retrieve the original dtype for quantized models.

[FA2] Cast to correct dtype #26560

[FA2] Cast to correct dtype #26560

Uh oh!

Conversation

younesbelkada commented Oct 3, 2023

What does this PR do?

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 3, 2023

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Oct 3, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Oct 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MrTimmy89 Oct 3, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Oct 6, 2023

Uh oh!

LysandreJik left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LysandreJik Oct 11, 2023

Choose a reason for hiding this comment

Uh oh!

LysandreJik Oct 11, 2023

Choose a reason for hiding this comment

Uh oh!

hiyouga commented Oct 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten Oct 11, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Oct 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[`FA2`] Cast to correct dtype #26560

[`FA2`] Cast to correct dtype #26560

ArthurZucker left a comment •

edited

Loading

younesbelkada commented Oct 3, 2023 •

edited

Loading

LysandreJik left a comment •

edited

Loading

hiyouga commented Oct 11, 2023 •

edited

Loading