[Bart/Memory] SelfAttention only returns weights if config.output_attentions #3369

sshleifer · 2020-03-21T19:53:38Z

Previously, SelfAttention would always return attn_weights, and then BartDecoder and BartEncoder would decide whether to return them to the user.
The attn_weights tensor is fairly large, with shape = (bs, num_heads, tgt_len, src_len)
This meant that the memory allocated for attn_weights could not be freed until after the forward pass of BartDecoder.

Now: SelfAttention returns (output, None) if config.output_attentions=False and the memory can be freed

Impact: memory can be freed after SelfAttention returns. -600MB peak GPU consumption for batch_size=6, tgt_len=src_len=1024, num_heads=16

Speed impact: negligible

thomwolf

Nice!

Only return weights if needed

9a606fe

sshleifer marked this pull request as ready for review March 22, 2020 16:11

sshleifer requested review from julien-c and thomwolf March 22, 2020 16:26

sshleifer changed the title ~~[Bart/Memory] SelfAttention only returns weights if needed~~ [Bart/Memory] SelfAttention only returns weights if config.output_attentions Mar 22, 2020

thomwolf approved these changes Mar 26, 2020

View reviewed changes

sshleifer merged commit 63f4d8c into huggingface:master Mar 26, 2020

sshleifer deleted the need-weights-clean branch March 26, 2020 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bart/Memory] SelfAttention only returns weights if config.output_attentions #3369

[Bart/Memory] SelfAttention only returns weights if config.output_attentions #3369

Uh oh!

sshleifer commented Mar 21, 2020

Uh oh!

thomwolf left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Bart/Memory] SelfAttention only returns weights if config.output_attentions #3369

[Bart/Memory] SelfAttention only returns weights if config.output_attentions #3369

Uh oh!

Conversation

sshleifer commented Mar 21, 2020

Uh oh!

thomwolf left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants