Flax T5 model - code typo during AutoRegressive decoding?

### System Info

- `transformers` version: 4.33.3
- Platform: Linux-5.15.120+-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.17.3
- Safetensors version: 0.3.3
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.0.1+cu118 (False)
- Tensorflow version (GPU?): 2.13.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.7.4 (cpu)
- Jax version: 0.4.10
- JaxLib version: 0.4.10
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No

### Who can help?

@sanchit-gandhi

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

The discussed file is transformers/src/transformers/models/t5/modeling_flax_t5.py at line 408

https://github.com/huggingface/transformers/blob/2aef9a96011133f6b399b598fd69cfeca936eb37/src/transformers/models/t5/modeling_flax_t5.py#L408C1-L409C1


### Expected behavior

Hi,

During autoregressive decoding, keys and values are computed one token at a time and cache is used to recover the keys and values from previous calls.

In the `_concatenate_to_cache` method of the Attention module, an attention mask is computed in order for the new query to only attend to the previous key positions and not the remaining zero elements. This is what is explained in the comments in this function.

However, the new attention mask is not used afterwards, because its name is `attention_attention_mask` and not `attention_mask` which is the one being used in every other line. 

From my understanding, this is likely a typo and I am not sure how it changes the behavior of the model, if at all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flax T5 model - code typo during AutoRegressive decoding? #26564

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Flax T5 model - code typo during AutoRegressive decoding? #26564

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions