[Bart/Memory] Two separate, smaller decoder attention masks #3371

sshleifer · 2020-03-22T02:47:04Z

Background

The bart decoder requires two masks: one to ignore padding tokens, the other (causal_mask), to avoid attending to future tokens during training.
Previously, _prepare_bart_decoder_inputs combined these two masks into one float_mask of shape (bsz, 1, tgt_len, tgt_len) filled with -inf for tokens that should be ignored. This mask was subsequently added to the attention activations.

Now, we return the two masks separately:
decoder_padding_mask: shape (bs, tgt_len), bool
causal_mask: shape (tgt_len, tgt_len), float

Impact

saves 800 MB for bs=6, tgt_len=1024, with negligible speed impact.

Notes

The distinct data types (bool and float) are used to minimize code change.

…k into decouple-mask

codecov-io · 2020-03-22T02:55:47Z

Codecov Report

Merging #3371 into master will decrease coverage by 0.02%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##           master    #3371      +/-   ##
==========================================
- Coverage   77.55%   77.52%   -0.03%     
==========================================
  Files         100      100              
  Lines       16970    16957      -13     
==========================================
- Hits        13161    13146      -15     
- Misses       3809     3811       +2

Impacted Files	Coverage Δ
src/transformers/modeling_bart.py	`97.59% <93.75%> (-0.50%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf72479...8db65c1. Read the comment docs.

sshleifer added 15 commits March 18, 2020 00:24

started

3382043

more tests passing

a664030

logging

0d78e05

log inside attn

9ea4f8c

add logging

a39f93d

back to mixin

976486d

git surgery: success

8f13d90

fixed pass through

d118f1e

Merge branch 'decouple-mask' of github.com:sshleifer/transformers_for…

930be4a

…k into decouple-mask

redo git magic

400b779

better msg

0999abd

Merge branch 'master' into decouple-mask

ac5376e

log mem

04c8828

tests passing

c96c234

Decouple decoder attention masks

e3c14ed

sshleifer changed the title ~~Decouple mask clean~~ [WIP/Bart/Memory] Two separate, smaller decoder attention masks Mar 22, 2020

cleanup

8db65c1

Always make causal mask

8b9e407

sshleifer changed the title ~~[WIP/Bart/Memory] Two separate, smaller decoder attention masks~~ [Bart/Memory] Two separate, smaller decoder attention masks Mar 22, 2020

remove dead code

64938d8

sshleifer marked this pull request as ready for review March 22, 2020 16:10

sshleifer requested review from julien-c and thomwolf March 22, 2020 16:28

sshleifer added 2 commits March 22, 2020 12:40

send causal mask to correct device

58d99d1

Fix kwargs

9bc2b79

thomwolf approved these changes Mar 26, 2020

View reviewed changes

sshleifer added 3 commits March 26, 2020 18:44

Merge branch 'master' into decouple-mask-clean

73391ac

Fix kwarg merge conflict

2829f62

style

f935ba1

sshleifer merged commit 3ee431d into huggingface:master Mar 27, 2020

sshleifer deleted the decouple-mask-clean branch March 27, 2020 01:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bart/Memory] Two separate, smaller decoder attention masks #3371

[Bart/Memory] Two separate, smaller decoder attention masks #3371

Uh oh!

sshleifer commented Mar 22, 2020 •

edited

Loading

Uh oh!

codecov-io commented Mar 22, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Bart/Memory] Two separate, smaller decoder attention masks #3371

[Bart/Memory] Two separate, smaller decoder attention masks #3371

Uh oh!

Conversation

sshleifer commented Mar 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Impact

Notes

Uh oh!

codecov-io commented Mar 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sshleifer commented Mar 22, 2020 •

edited

Loading

codecov-io commented Mar 22, 2020 •

edited

Loading