You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/language-modeling/README.md
+25-7Lines changed: 25 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,8 +22,7 @@ ALBERT, BERT, DistilBERT, RoBERTa, XLNet... GPT and GPT-2 are trained or fine-tu
22
22
loss. XLNet uses permutation language modeling (PLM), you can find more information about the differences between those
23
23
objectives in our [model summary](https://huggingface.co/transformers/model_summary.html).
24
24
25
-
These scripts leverage the 🤗 Datasets library and the Trainer API. You can easily customize them to your needs if you
26
-
need extra processing on your datasets.
25
+
There are two sets of scripts provided. The first set leverages the Trainer API. The second set with `no_trainer` in the suffix uses a custom training loop and leverages the 🤗 Accelerate library . Both sets use the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
27
26
28
27
**Note:** The old script `run_language_modeling.py` is still available [here](https://github.com/huggingface/transformers/blob/master/examples/legacy/run_language_modeling.py).
29
28
@@ -60,6 +59,15 @@ python run_clm.py \
60
59
--output_dir /tmp/test-clm
61
60
```
62
61
62
+
This uses the built in HuggingFace `Trainer` for training. If you want to use a custom training loop, you can utilize or adapt the `run_clm_no_trainer.py` script. Take a look at the script for a list of supported arguments. An example is shown below:
63
+
64
+
```bash
65
+
python run_clm_no_trainer.py \
66
+
--dataset_name wikitext \
67
+
--dataset_config_name wikitext-2-raw-v1 \
68
+
--model_name_or_path gpt2 \
69
+
--output_dir /tmp/test-clm
70
+
```
63
71
64
72
### RoBERTa/BERT/DistilBERT and masked language modeling
65
73
@@ -95,23 +103,33 @@ python run_mlm.py \
95
103
If your dataset is organized with one sample per line, you can use the `--line_by_line` flag (otherwise the script
96
104
concatenates all texts and then splits them in blocks of the same length).
97
105
106
+
This uses the built in HuggingFace `Trainer` for training. If you want to use a custom training loop, you can utilize or adapt the `run_mlm_no_trainer.py` script. Take a look at the script for a list of supported arguments. An example is shown below:
107
+
108
+
```bash
109
+
python run_mlm_no_trainer.py \
110
+
--dataset_name wikitext \
111
+
--dataset_config_name wikitext-2-raw-v1 \
112
+
--model_name_or_path roberta-base \
113
+
--output_dir /tmp/test-mlm
114
+
```
115
+
98
116
**Note:** On TPU, you should use the flag `--pad_to_max_length` in conjunction with the `--line_by_line` flag to make
99
117
sure all your batches have the same length.
100
118
101
119
### Whole word masking
102
120
103
-
This part was moved to `examples/research_projects/mlm_wwm`.
121
+
This part was moved to `examples/research_projects/mlm_wwm`.
104
122
105
123
### XLNet and permutation language modeling
106
124
107
-
XLNet uses a different training objective, which is permutation language modeling. It is an autoregressive method
108
-
to learn bidirectional contexts by maximizing the expected likelihood over all permutations of the input
125
+
XLNet uses a different training objective, which is permutation language modeling. It is an autoregressive method
126
+
to learn bidirectional contexts by maximizing the expected likelihood over all permutations of the input
109
127
sequence factorization order.
110
128
111
-
We use the `--plm_probability` flag to define the ratio of length of a span of masked tokens to surrounding
129
+
We use the `--plm_probability` flag to define the ratio of length of a span of masked tokens to surrounding
112
130
context length for permutation language modeling.
113
131
114
-
The `--max_span_length` flag may also be used to limit the length of a span of masked tokens used
132
+
The `--max_span_length` flag may also be used to limit the length of a span of masked tokens used
0 commit comments