Add t5 summarization example #3411

patrickvonplaten · 2020-03-24T11:53:46Z

Adds TF 2.0 Example for T5 summarization.

Adds dataset download file via tensorflow_datasets and a rouge scorer.

Example is currently being tested on T5-large on GPU to see how rouge scorer performs in comparsion to examples/summarization/bart rouge scorer.

patrickvonplaten · 2020-03-24T11:56:54Z

examples/summarization/t5/evaluate_cnn.py

+            max_length=max_length,
+            min_length=min_length,
+            early_stopping=True,
+            bos_token_id=tokenizer.pad_token_id,


@craffel -
I wanted to add an example for T5 summarization for the CNN/DM dataset.

These are the default params, I use for t5 summarization. I saw in the paper that you use num_beams=4 as well and from the code:
https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/gin/beam_search.gin
and
https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/gin/sequence_lengths/cnn_dailymail_v002.gin, I assumed that you pad/cut to 512 tokens. transformers uses the "simple" length_penalty instead of the "google" length_penalty so instead of alpha=0.6, I set the length penalty here to 2.0 - do you think that is good? It's the default parameter for Bart. And regarding min_length and max_length, I am not sure whether these params are good - they are copied form the Bart defaults.

Hey Patrick, thanks for making this. Some notes

The released checkpoints are after multitask pre-training. You will get decent summarization results with them, but in order to get our SoTA-at-the-time summarization results you'll need to finetune the model further before generating from it. You can probably ignore that for the sake of an example, but it may be worth mentioning in the README.

We didn't play with the num beams or length penalty at all, just set them to values that had been used in the past on machine translation and never tweaked them at all. So, your values are probably fine (especially since you already are going to be losing some performance without fine-tuning).

We don't have min length/max length functionality in our code; we just let the model choose whatever length. If BART uses this trick, it probably helps, and it seems somewhat model-agnostic so we can just use their values.

The model should work equivalently whether you pad to a max length or not. You should also be able to use a sequence length longer than 512 and it should work fine too.

Thanks a lot!

craffel · 2020-03-24T16:47:10Z

examples/summarization/t5/README.md

+Install `files2rouge` following the instructions at [here](https://github.com/pltrdy/files2rouge).
+I also needed to run `sudo apt-get install libxml-parser-perl`
+
+```python


Any reason not to use a Python ROUGE scorer, e.g.
https://github.com/google-research/google-research/tree/master/rouge
That way you can just score within the evaluate_cnn.py file instead of having people run this separate code.

Added the google-research rouge scorer :-). Works well, but it's not the nicest python API to score a list of strings :D.

thomwolf

Ok for me, I let you take care of @craffel comments

sshleifer

pending my comments

examples/summarization/t5/README.md

examples/summarization/t5/download_cnn_daily_mail.py

examples/summarization/t5/evaluate_cnn.py

patrickvonplaten · 2020-03-26T13:37:15Z

pending my comments

Very much down to share the summarization code in another PR!

patrickvonplaten · 2020-03-26T14:44:41Z

Code quality test fails because of unpinned isort library (see #3449)

thomwolf

Ok for me!

craffel · 2020-03-26T16:53:59Z

examples/summarization/t5/README.md

@@ -0,0 +1,25 @@
+***This script evaluates the [T5 Model](https://arxiv.org/pdf/1910.10683.pdf) ``t5-large`` on the CNN/Daily Mail test dataset. Please note that the results in the paper were attained using a ``t5-large`` model fine-tuned on summarization, so that results will be slightly worse here***


Suggest changing "the T5 Model t5-large" to "the multitask pre-trained checkpoint for t5-large

If you want to be more specific than "slightly", it looks like they are about 0.5 ROUGE points (on all ROUGE variants) lower than they would be with fine-tuning.

patrickvonplaten requested review from sshleifer and thomwolf March 24, 2020 11:53

patrickvonplaten commented Mar 24, 2020

View reviewed changes

craffel reviewed Mar 24, 2020

View reviewed changes

patrickvonplaten mentioned this pull request Mar 25, 2020

Add wmt translation example #3428

Merged

thomwolf approved these changes Mar 25, 2020

View reviewed changes

patrickvonplaten mentioned this pull request Mar 25, 2020

Extend config with task specific configs. #3433

Merged

3 tasks

patrickvonplaten force-pushed the add_t5_summarization_example branch from e14ca40 to 4fcfa08 Compare March 26, 2020 10:41

patrickvonplaten mentioned this pull request Mar 26, 2020

revert unpin isort commit #3449

Merged

patrickvonplaten force-pushed the add_t5_summarization_example branch from 1c9504c to 6b4ad5e Compare March 26, 2020 12:11

sshleifer approved these changes Mar 26, 2020

View reviewed changes

patrickvonplaten added 5 commits March 26, 2020 14:40

rebase to master

572824b

change tf to pytorch

66cbf62

change to pytorch

226ebf4

small fix

ebd2cf6

renaming

5d1c4a6

patrickvonplaten force-pushed the add_t5_summarization_example branch from 2cda9e8 to 5d1c4a6 Compare March 26, 2020 13:40

patrickvonplaten added 2 commits March 26, 2020 14:57

add gpu training possibility

74f7097

renaming

146d687

thomwolf approved these changes Mar 26, 2020

View reviewed changes

craffel reviewed Mar 26, 2020

View reviewed changes

patrickvonplaten added 4 commits March 26, 2020 17:55

improve README

24d2f59

incoorporate collins feedback

58b99d0

better Readme

153928c

better README.md

2997e91

thomwolf merged commit e703e92 into huggingface:master Mar 26, 2020

patrickvonplaten mentioned this pull request Mar 27, 2020

CircleCI ExamplesTests::test_run_squad failing #3469

Closed

		@@ -0,0 +1,25 @@
		*This script evaluates the [T5 Model](https://arxiv.org/pdf/1910.10683.pdf) ``t5-large`` on the CNN/Daily Mail test dataset. Please note that the results in the paper were attained using a ``t5-large`` model fine-tuned on summarization, so that results will be slightly worse here*

Add t5 summarization example #3411

Add t5 summarization example #3411

Uh oh!

Conversation

patrickvonplaten commented Mar 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten Mar 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craffel Mar 24, 2020

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Mar 24, 2020

Choose a reason for hiding this comment

Uh oh!

craffel Mar 24, 2020

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Mar 26, 2020

Choose a reason for hiding this comment

Uh oh!

thomwolf left a comment

Choose a reason for hiding this comment

Uh oh!

sshleifer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten commented Mar 26, 2020

Uh oh!

patrickvonplaten commented Mar 26, 2020

Uh oh!

thomwolf left a comment

Choose a reason for hiding this comment

Uh oh!

craffel Mar 26, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

patrickvonplaten commented Mar 24, 2020 •

edited

Loading

patrickvonplaten Mar 24, 2020 •

edited

Loading