Skip to content

Conversation

@Rocketknight1
Copy link
Member

With apologies for the delay, this PR should hopefully resolve the issues in #24637. @abb128 can you please try installing from this PR and verify if it resolves your issues? You can install from this PR with:

pip install --upgrade git+https://github.com/huggingface/transformers.git@tf_opt_fixes

Fixes #24637

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 1, 2023

The documentation is not available anymore as the PR was closed or merged.

@Rocketknight1
Copy link
Member Author

No response, but we should probably merge anyway. Pinging @amyeroberts for core maintainer review!

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

Just a question about the checks and values the inputs can have in _prepare_decoder_attention_mask

_, seq_length = input_shape
tf.debugging.assert_equal(
seq_length + past_key_values_length,
shape_list(attention_mask)[1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check robust? From the diff it looks like attention_mask can be None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! The TFOPTDecoder layer checks for None attention masks and replaces them with tf.ones. That happens before _prepare_decoder_attention_mask is called. The earlier code had an if attention_mask is not None branch that was just always taken as a result.

@Rocketknight1
Copy link
Member Author

@amyeroberts Sorry for the delay, I lost track of this one!

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@Rocketknight1 Rocketknight1 merged commit 842e99f into main Sep 6, 2023
@Rocketknight1 Rocketknight1 deleted the tf_opt_fixes branch September 6, 2023 12:37
parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
* stash commit

* More OPT updates

* Update src/transformers/models/opt/modeling_tf_opt.py

Co-authored-by: amyeroberts <[email protected]>

---------

Co-authored-by: amyeroberts <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TFOPTForCausalLM Attention mask size mismatch exception

3 participants