[Seq2Seq Generation] Call encoder before expanding input_ids #3370

sshleifer · 2020-03-21T20:45:10Z

Proposing to call model.encoder before expanding input_ids to effective_batch_size*num_beams.

For Bart, this saves 1.5 GB of GPU mem on batch_size=6. Savings probably similar for T5 (untested).

Requires knowing which index of the encoder_outputs is associated with the batch dim (we need to expand this dimension), which is different between Bart and T5. This difference is encoded in the self.encoder_outputs_batch_idx variable.

This PR is WIP because encoder_outputs_batch_idx could be avoided if we transposed Bart's encoder_outputs, which I haven't tried.

patrickvonplaten · 2020-03-22T20:23:31Z

Like the change a lot!
One question I asked myself: With this change the encoder_outputs which are the same point to the same memory address -> could that lead to problems? Probably not because the encoder_outputs are never changed, right?

I'd just propose some renaming.

patrickvonplaten · 2020-03-22T20:23:58Z

src/transformers/modeling_bart.py

    config_class = BartConfig
    base_model_prefix = "model"
    pretrained_model_archive_map = BART_PRETRAINED_MODEL_ARCHIVE_MAP
+    encoder_outputs_batch_idx = 1  # outputs shaped (bs, ...)


would change the name to encoder_outputs_batch_dim_idx

patrickvonplaten · 2020-03-22T20:24:03Z

src/transformers/modeling_t5.py

    pretrained_model_archive_map = T5_PRETRAINED_MODEL_ARCHIVE_MAP
    load_tf_weights = load_tf_weights_in_t5
    base_model_prefix = "transformer"
+    encoder_outputs_batch_idx = 0  # outputs shaped (bs, ...)


would change the name to encoder_outputs_batch_dim_idx

patrickvonplaten · 2020-03-22T20:24:16Z

src/transformers/modeling_utils.py

                device=next(self.parameters()).device,
            )
            cur_len = 1
+            batch_idx = self.encoder_outputs_batch_idx


also here batch_dim_idx

patrickvonplaten · 2020-03-22T20:25:49Z

src/transformers/modeling_utils.py

+            assert (
+                batch_size == encoder_outputs[0].shape[batch_idx]
+            ), f"expected encoder_outputs[0] to have 1st dimension bs={batch_size}, got {encoder_outputs[0].shape[1]} "
+            expanded_index = (


maybe also expanded_idx because we always use idx in the function?

thomwolf

LGTM.
I agree with the name changes proposed by @patrickvonplaten

sshleifer added 2 commits March 21, 2020 16:25

cleanup

8bf0ab9

Wip: rearrange

823d45f

sshleifer changed the title ~~[Generation/WIP] Call encoder earlier~~ [Generation/WIP] Call encoder before expanding input_ids Mar 21, 2020

RM print

7fc9a22

sshleifer changed the title ~~[Generation/WIP] Call encoder before expanding input_ids~~ [Seq2Seq Generation] Call encoder before expanding input_ids Mar 22, 2020

sshleifer requested a review from patrickvonplaten March 22, 2020 16:11

sshleifer marked this pull request as ready for review March 22, 2020 16:11

sshleifer requested review from julien-c and thomwolf March 22, 2020 16:27

patrickvonplaten reviewed Mar 22, 2020

View reviewed changes

thomwolf approved these changes Mar 26, 2020

View reviewed changes

sshleifer added 2 commits March 26, 2020 12:22

merge master

f19dd53

Renaming

f6fe92e

sshleifer merged commit 1a5aefc into huggingface:master Mar 26, 2020

sshleifer deleted the call-encoder-earlier branch March 26, 2020 22:41

patrickvonplaten mentioned this pull request Apr 13, 2020

[Generation, EncoderDecoder] Apply Encoder Decoder 1.5GB memory savings to TF as well #3778

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Seq2Seq Generation] Call encoder before expanding input_ids #3370

[Seq2Seq Generation] Call encoder before expanding input_ids #3370

Uh oh!

sshleifer commented Mar 21, 2020 •

edited

Loading

Uh oh!

patrickvonplaten commented Mar 22, 2020 •

edited

Loading

Uh oh!

patrickvonplaten Mar 22, 2020

Uh oh!

patrickvonplaten Mar 22, 2020

Uh oh!

patrickvonplaten Mar 22, 2020

Uh oh!

patrickvonplaten Mar 22, 2020

Uh oh!

thomwolf left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Seq2Seq Generation] Call encoder before expanding input_ids #3370

[Seq2Seq Generation] Call encoder before expanding input_ids #3370

Uh oh!

Conversation

sshleifer commented Mar 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Mar 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten Mar 22, 2020

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Mar 22, 2020

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Mar 22, 2020

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Mar 22, 2020

Choose a reason for hiding this comment

Uh oh!

thomwolf left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sshleifer commented Mar 21, 2020 •

edited

Loading

patrickvonplaten commented Mar 22, 2020 •

edited

Loading