Skip to content

Commit 6ab7d1a

Browse files
authored
Add Readme for language modeling scripts with accelerate (#11073)
1 parent 2199608 commit 6ab7d1a

File tree

2 files changed

+26
-8
lines changed

2 files changed

+26
-8
lines changed

examples/language-modeling/README.md

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,7 @@ ALBERT, BERT, DistilBERT, RoBERTa, XLNet... GPT and GPT-2 are trained or fine-tu
2222
loss. XLNet uses permutation language modeling (PLM), you can find more information about the differences between those
2323
objectives in our [model summary](https://huggingface.co/transformers/model_summary.html).
2424

25-
These scripts leverage the 🤗 Datasets library and the Trainer API. You can easily customize them to your needs if you
26-
need extra processing on your datasets.
25+
There are two sets of scripts provided. The first set leverages the Trainer API. The second set with `no_trainer` in the suffix uses a custom training loop and leverages the 🤗 Accelerate library . Both sets use the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
2726

2827
**Note:** The old script `run_language_modeling.py` is still available [here](https://github.com/huggingface/transformers/blob/master/examples/legacy/run_language_modeling.py).
2928

@@ -60,6 +59,15 @@ python run_clm.py \
6059
--output_dir /tmp/test-clm
6160
```
6261

62+
This uses the built in HuggingFace `Trainer` for training. If you want to use a custom training loop, you can utilize or adapt the `run_clm_no_trainer.py` script. Take a look at the script for a list of supported arguments. An example is shown below:
63+
64+
```bash
65+
python run_clm_no_trainer.py \
66+
--dataset_name wikitext \
67+
--dataset_config_name wikitext-2-raw-v1 \
68+
--model_name_or_path gpt2 \
69+
--output_dir /tmp/test-clm
70+
```
6371

6472
### RoBERTa/BERT/DistilBERT and masked language modeling
6573

@@ -95,23 +103,33 @@ python run_mlm.py \
95103
If your dataset is organized with one sample per line, you can use the `--line_by_line` flag (otherwise the script
96104
concatenates all texts and then splits them in blocks of the same length).
97105

106+
This uses the built in HuggingFace `Trainer` for training. If you want to use a custom training loop, you can utilize or adapt the `run_mlm_no_trainer.py` script. Take a look at the script for a list of supported arguments. An example is shown below:
107+
108+
```bash
109+
python run_mlm_no_trainer.py \
110+
--dataset_name wikitext \
111+
--dataset_config_name wikitext-2-raw-v1 \
112+
--model_name_or_path roberta-base \
113+
--output_dir /tmp/test-mlm
114+
```
115+
98116
**Note:** On TPU, you should use the flag `--pad_to_max_length` in conjunction with the `--line_by_line` flag to make
99117
sure all your batches have the same length.
100118

101119
### Whole word masking
102120

103-
This part was moved to `examples/research_projects/mlm_wwm`.
121+
This part was moved to `examples/research_projects/mlm_wwm`.
104122

105123
### XLNet and permutation language modeling
106124

107-
XLNet uses a different training objective, which is permutation language modeling. It is an autoregressive method
108-
to learn bidirectional contexts by maximizing the expected likelihood over all permutations of the input
125+
XLNet uses a different training objective, which is permutation language modeling. It is an autoregressive method
126+
to learn bidirectional contexts by maximizing the expected likelihood over all permutations of the input
109127
sequence factorization order.
110128

111-
We use the `--plm_probability` flag to define the ratio of length of a span of masked tokens to surrounding
129+
We use the `--plm_probability` flag to define the ratio of length of a span of masked tokens to surrounding
112130
context length for permutation language modeling.
113131

114-
The `--max_span_length` flag may also be used to limit the length of a span of masked tokens used
132+
The `--max_span_length` flag may also be used to limit the length of a span of masked tokens used
115133
for permutation language modeling.
116134

117135
Here is how to fine-tune XLNet on wikitext-2:

examples/language-modeling/run_mlm_no_trainer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@
5656

5757

5858
def parse_args():
59-
parser = argparse.ArgumentParser(description="Finetune a transformers model on a text classification task")
59+
parser = argparse.ArgumentParser(description="Finetune a transformers model on a Masked Language Modeling task")
6060
parser.add_argument(
6161
"--dataset_name",
6262
type=str,

0 commit comments

Comments
 (0)