Skip to content

Flax T5 model - code typo during AutoRegressive decoding? #26564

@giganttheo

Description

@giganttheo

System Info

  • transformers version: 4.33.3
  • Platform: Linux-5.15.120+-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.17.3
  • Safetensors version: 0.3.3
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.0.1+cu118 (False)
  • Tensorflow version (GPU?): 2.13.0 (False)
  • Flax version (CPU?/GPU?/TPU?): 0.7.4 (cpu)
  • Jax version: 0.4.10
  • JaxLib version: 0.4.10
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help?

@sanchit-gandhi

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The discussed file is transformers/src/transformers/models/t5/modeling_flax_t5.py at line 408

https://github.com/huggingface/transformers/blob/2aef9a96011133f6b399b598fd69cfeca936eb37/src/transformers/models/t5/modeling_flax_t5.py#L408C1-L409C1

Expected behavior

Hi,

During autoregressive decoding, keys and values are computed one token at a time and cache is used to recover the keys and values from previous calls.

In the _concatenate_to_cache method of the Attention module, an attention mask is computed in order for the new query to only attend to the previous key positions and not the remaining zero elements. This is what is explained in the comments in this function.

However, the new attention mask is not used afterwards, because its name is attention_attention_mask and not attention_mask which is the one being used in every other line.

From my understanding, this is likely a typo and I am not sure how it changes the behavior of the model, if at all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions