|
| 1 | +T5 |
| 2 | +---------------------------------------------------- |
| 3 | +**DISCLAIMER:** This model is still a work in progress, if you see something strange, |
| 4 | +file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_ |
| 5 | + |
| 6 | +Overview |
| 7 | +~~~~~ |
| 8 | +The T5 model was presented in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer <https://arxiv.org/pdf/1910.10683.pdf>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu in |
| 9 | +Here the abstract: |
| 10 | + |
| 11 | +*Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. |
| 12 | +In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. |
| 13 | +Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. |
| 14 | +By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. |
| 15 | +To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.* |
| 16 | + |
| 17 | +The Authors' code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`_ . |
| 18 | + |
| 19 | +Tips |
| 20 | +~~~~~~~~~~~~~~~~~~~~ |
| 21 | +- T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised |
| 22 | + and supervised tasks and which each task is cast as a sequence to sequence task. |
| 23 | + Therefore T5 works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g.: for translation: *translate English to German: ..., summarize: ...*. |
| 24 | + For more information about the which prefix to use, it is easiest to look into Appendix D of the `paper <https://arxiv.org/pdf/1910.10683.pdf>`_ . |
| 25 | +- For sequence to sequence generation, it is recommended to use ``T5ForConditionalGeneration.generate()``. The method takes care of feeding the encoded input via cross-attention layers to the decoder and auto-regressively generating the decoder output. |
| 26 | +- T5 uses relative scalar embeddings. Encoder input padding can be done on the left and on the right. |
| 27 | + |
| 28 | + |
| 29 | +T5Config |
| 30 | +~~~~~~~~~~~~~~~~~~~~~ |
| 31 | + |
| 32 | +.. autoclass:: transformers.T5Config |
| 33 | + :members: |
| 34 | + |
| 35 | + |
| 36 | +T5Tokenizer |
| 37 | +~~~~~~~~~~~~~~~~~~~~~ |
| 38 | + |
| 39 | +.. autoclass:: transformers.T5Tokenizer |
| 40 | + :members: build_inputs_with_special_tokens, get_special_tokens_mask, |
| 41 | + create_token_type_ids_from_sequences, save_vocabulary |
| 42 | + |
| 43 | + |
| 44 | +T5Model |
| 45 | +~~~~~~~~~~~~~~~~~~~~ |
| 46 | + |
| 47 | +.. autoclass:: transformers.T5Model |
| 48 | + :members: |
| 49 | + |
| 50 | + |
| 51 | +T5ForConditionalGeneration |
| 52 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 53 | + |
| 54 | +.. autoclass:: transformers.T5ForConditionalGeneration |
| 55 | + :members: |
| 56 | + |
| 57 | + |
| 58 | +TFT5Model |
| 59 | +~~~~~~~~~~~~~~~~~~~~ |
| 60 | + |
| 61 | +.. autoclass:: transformers.TFT5Model |
| 62 | + :members: |
| 63 | + |
| 64 | + |
| 65 | +TFT5ForConditionalGeneration |
| 66 | +~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 67 | + |
| 68 | +.. autoclass:: transformers.TFT5ForConditionalGeneration |
| 69 | + :members: |
0 commit comments