-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Add t5 summarization example #3411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add t5 summarization example #3411
Conversation
| max_length=max_length, | ||
| min_length=min_length, | ||
| early_stopping=True, | ||
| bos_token_id=tokenizer.pad_token_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@craffel -
I wanted to add an example for T5 summarization for the CNN/DM dataset.
These are the default params, I use for t5 summarization. I saw in the paper that you use num_beams=4 as well and from the code:
https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/gin/beam_search.gin
and
https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/gin/sequence_lengths/cnn_dailymail_v002.gin, I assumed that you pad/cut to 512 tokens. transformers uses the "simple" length_penalty instead of the "google" length_penalty so instead of alpha=0.6, I set the length penalty here to 2.0 - do you think that is good? It's the default parameter for Bart. And regarding min_length and max_length, I am not sure whether these params are good - they are copied form the Bart defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Patrick, thanks for making this. Some notes
- The released checkpoints are after multitask pre-training. You will get decent summarization results with them, but in order to get our SoTA-at-the-time summarization results you'll need to finetune the model further before generating from it. You can probably ignore that for the sake of an example, but it may be worth mentioning in the README.
- We didn't play with the num beams or length penalty at all, just set them to values that had been used in the past on machine translation and never tweaked them at all. So, your values are probably fine (especially since you already are going to be losing some performance without fine-tuning).
- We don't have min length/max length functionality in our code; we just let the model choose whatever length. If BART uses this trick, it probably helps, and it seems somewhat model-agnostic so we can just use their values.
- The model should work equivalently whether you pad to a max length or not. You should also be able to use a sequence length longer than 512 and it should work fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
examples/summarization/t5/README.md
Outdated
| Install `files2rouge` following the instructions at [here](https://github.com/pltrdy/files2rouge). | ||
| I also needed to run `sudo apt-get install libxml-parser-perl` | ||
|
|
||
| ```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use a Python ROUGE scorer, e.g.
https://github.com/google-research/google-research/tree/master/rouge
That way you can just score within the evaluate_cnn.py file instead of having people run this separate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the google-research rouge scorer :-). Works well, but it's not the nicest python API to score a list of strings :D.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for me, I let you take care of @craffel comments
e14ca40 to
4fcfa08
Compare
1c9504c to
6b4ad5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending my comments
Very much down to share the summarization code in another PR! |
2cda9e8 to
5d1c4a6
Compare
|
Code quality test fails because of unpinned isort library (see #3449) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for me!
examples/summarization/t5/README.md
Outdated
| @@ -0,0 +1,25 @@ | |||
| ***This script evaluates the [T5 Model](https://arxiv.org/pdf/1910.10683.pdf) ``t5-large`` on the CNN/Daily Mail test dataset. Please note that the results in the paper were attained using a ``t5-large`` model fine-tuned on summarization, so that results will be slightly worse here*** | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds TF 2.0 Example for T5 summarization.
Adds dataset download file via
tensorflow_datasetsand a rouge scorer.Example is currently being tested on T5-large on GPU to see how rouge scorer performs in comparsion to
examples/summarization/bartrouge scorer.