Skip to content

Commit fa9af24

Browse files
Add T5 to docs (#3461)
* add t5 docs basis * improve docs * add t5 docs * improve t5 docstring * add t5 tokenizer docstring * finish docstring * make style * add pretrained models * correct typo * make examples work * finalize docs
1 parent ff80b73 commit fa9af24

File tree

7 files changed

+285
-129
lines changed

7 files changed

+285
-129
lines changed

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,3 +103,4 @@ The library currently contains PyTorch and Tensorflow implementations, pre-train
103103
model_doc/xlmroberta
104104
model_doc/flaubert
105105
model_doc/bart
106+
model_doc/t5

docs/source/model_doc/t5.rst

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
T5
2+
----------------------------------------------------
3+
**DISCLAIMER:** This model is still a work in progress, if you see something strange,
4+
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
5+
6+
Overview
7+
~~~~~
8+
The T5 model was presented in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer <https://arxiv.org/pdf/1910.10683.pdf>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu in
9+
Here the abstract:
10+
11+
*Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice.
12+
In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format.
13+
Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks.
14+
By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
15+
To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.*
16+
17+
The Authors' code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`_ .
18+
19+
Tips
20+
~~~~~~~~~~~~~~~~~~~~
21+
- T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised
22+
and supervised tasks and which each task is cast as a sequence to sequence task.
23+
Therefore T5 works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g.: for translation: *translate English to German: ..., summarize: ...*.
24+
For more information about the which prefix to use, it is easiest to look into Appendix D of the `paper <https://arxiv.org/pdf/1910.10683.pdf>`_ .
25+
- For sequence to sequence generation, it is recommended to use ``T5ForConditionalGeneration.generate()``. The method takes care of feeding the encoded input via cross-attention layers to the decoder and auto-regressively generating the decoder output.
26+
- T5 uses relative scalar embeddings. Encoder input padding can be done on the left and on the right.
27+
28+
29+
T5Config
30+
~~~~~~~~~~~~~~~~~~~~~
31+
32+
.. autoclass:: transformers.T5Config
33+
:members:
34+
35+
36+
T5Tokenizer
37+
~~~~~~~~~~~~~~~~~~~~~
38+
39+
.. autoclass:: transformers.T5Tokenizer
40+
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
41+
create_token_type_ids_from_sequences, save_vocabulary
42+
43+
44+
T5Model
45+
~~~~~~~~~~~~~~~~~~~~
46+
47+
.. autoclass:: transformers.T5Model
48+
:members:
49+
50+
51+
T5ForConditionalGeneration
52+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
53+
54+
.. autoclass:: transformers.T5ForConditionalGeneration
55+
:members:
56+
57+
58+
TFT5Model
59+
~~~~~~~~~~~~~~~~~~~~
60+
61+
.. autoclass:: transformers.TFT5Model
62+
:members:
63+
64+
65+
TFT5ForConditionalGeneration
66+
~~~~~~~~~~~~~~~~~~~~~~~~~~
67+
68+
.. autoclass:: transformers.TFT5ForConditionalGeneration
69+
:members:

docs/source/pretrained_models.rst

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,6 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
275275
| | | | FlauBERT large architecture |
276276
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
277277
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
278-
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
279278
| Bart | ``bart-large`` | | 12-layer, 1024-hidden, 16-heads, 406M parameters |
280279
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_) |
281280
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
@@ -285,6 +284,3 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
285284
| | ``bart-large-cnn`` | | 12-layer, 1024-hidden, 16-heads, 406M parameters (same as base) |
286285
| | | | bart-large base architecture finetuned on cnn summarization task |
287286
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
288-
289-
290-
.. <https://huggingface.co/transformers/examples.html>`__

src/transformers/modeling_bart.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,10 @@
7272
Mask to avoid performing attention on padding token indices in input_ids.
7373
Mask values selected in ``[0, 1]``:
7474
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
75+
encoder_outputs (tuple(:obj:`tuple(torch.FloatTensor)`, `optional`, defaults to :obj:`None`):
76+
Tuple consists of (`last_hidden_state`, `optional`: `hidden_states`, `optional`: `attentions`)
77+
`last_hidden_state` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`, defaults to :obj:`None`) is a sequence of hidden-states at the output of the last layer of the encoder.
78+
Used in the cross-attention of the decoder.
7579
decoder_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`, defaults to :obj:`None`):
7680
Provide for translation and summarization training. By default, the model will create this tensor by shifting the input_ids right, following the paper.
7781
decoder_attention_mask (:obj:`torch.BoolTensor` of shape :obj:`(batch_size, tgt_seq_len)`, `optional`, defaults to :obj:`None`):
@@ -972,7 +976,7 @@ def forward(
972976
Returns:
973977
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BartConfig`) and inputs:
974978
loss (:obj:`torch.FloatTensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`label` is provided):
975-
Classification loss (cross entropy)
979+
Classification loss (cross entropy)
976980
logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.num_labels)`):
977981
Classification (or regression if config.num_labels==1) scores (before SoftMax).
978982
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):

0 commit comments

Comments
 (0)