Skip to content

Conversation

@sshleifer
Copy link
Contributor

@sshleifer sshleifer commented Mar 22, 2020

Background

The bart decoder requires two masks: one to ignore padding tokens, the other (causal_mask), to avoid attending to future tokens during training.
Previously, _prepare_bart_decoder_inputs combined these two masks into one float_mask of shape (bsz, 1, tgt_len, tgt_len) filled with -inf for tokens that should be ignored. This mask was subsequently added to the attention activations.

Now, we return the two masks separately:
decoder_padding_mask: shape (bs, tgt_len), bool
causal_mask: shape (tgt_len, tgt_len), float

Impact

saves 800 MB for bs=6, tgt_len=1024, with negligible speed impact.

Notes

  • The distinct data types (bool and float) are used to minimize code change.

@sshleifer sshleifer changed the title Decouple mask clean [WIP/Bart/Memory] Two separate, smaller decoder attention masks Mar 22, 2020
@codecov-io
Copy link

codecov-io commented Mar 22, 2020

Codecov Report

Merging #3371 into master will decrease coverage by 0.02%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3371      +/-   ##
==========================================
- Coverage   77.55%   77.52%   -0.03%     
==========================================
  Files         100      100              
  Lines       16970    16957      -13     
==========================================
- Hits        13161    13146      -15     
- Misses       3809     3811       +2     
Impacted Files Coverage Δ
src/transformers/modeling_bart.py 97.59% <93.75%> (-0.50%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf72479...8db65c1. Read the comment docs.

@sshleifer sshleifer changed the title [WIP/Bart/Memory] Two separate, smaller decoder attention masks [Bart/Memory] Two separate, smaller decoder attention masks Mar 22, 2020
@sshleifer sshleifer marked this pull request as ready for review March 22, 2020 16:10
@sshleifer sshleifer requested review from julien-c and thomwolf March 22, 2020 16:28
@sshleifer sshleifer merged commit 3ee431d into huggingface:master Mar 27, 2020
@sshleifer sshleifer deleted the decouple-mask-clean branch March 27, 2020 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants