Skip to content

Commit ade77cc

Browse files
committed
docs: replace torch.distributed.run by torchrun
`transformers` now officially support pytorch >= 1.10. The entrypoint `torchrun`` is present from 1.10 onwards. Signed-off-by: Peter Pan <[email protected]>
1 parent b074461 commit ade77cc

File tree

25 files changed

+46
-46
lines changed

25 files changed

+46
-46
lines changed

ISSUES.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ You are not required to read the following guidelines before opening an issue. H
152152

153153
```bash
154154
cd examples/seq2seq
155-
python -m torch.distributed.launch --nproc_per_node=2 ./finetune_trainer.py \
155+
torchrun --nproc_per_node=2 ./finetune_trainer.py \
156156
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
157157
--output_dir output_dir --overwrite_output_dir \
158158
--do_train --n_train 500 --num_train_epochs 1 \

docs/source/de/run_scripts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ Der [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) unt
130130
- Legen Sie die Anzahl der zu verwendenden GPUs mit dem Argument `nproc_per_node` fest.
131131

132132
```bash
133-
python -m torch.distributed.launch \
133+
torchrun \
134134
--nproc_per_node 8 pytorch/summarization/run_summarization.py \
135135
--fp16 \
136136
--model_name_or_path t5-small \

docs/source/en/main_classes/deepspeed.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,7 @@ The information in this section isn't not specific to the DeepSpeed integration
287287

288288
For the duration of this section let's assume that you have 2 nodes with 8 gpus each. And you can reach the first node with `ssh hostname1` and second node with `ssh hostname2`, and both must be able to reach each other via ssh locally without a password. Of course, you will need to rename these host (node) names to the actual host names you are working with.
289289

290-
#### The torch.distributed.run launcher
290+
#### The torch.distributed.run(torchrun) launcher
291291

292292

293293
For example, to use `torch.distributed.run`, you could do:

docs/source/en/main_classes/trainer.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ Let's discuss how you can tell your program which GPUs are to be used and in wha
206206
When using [`DistributedDataParallel`](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html) to use only a subset of your GPUs, you simply specify the number of GPUs to use. For example, if you have 4 GPUs, but you wish to use the first 2 you can do:
207207

208208
```bash
209-
python -m torch.distributed.launch --nproc_per_node=2 trainer-program.py ...
209+
torchrun --nproc_per_node=2 trainer-program.py ...
210210
```
211211

212212
if you have either [`accelerate`](https://github.com/huggingface/accelerate) or [`deepspeed`](https://github.com/microsoft/DeepSpeed) installed you can also accomplish the same by using one of:
@@ -233,15 +233,15 @@ If you have multiple GPUs and you'd like to use only 1 or a few of those GPUs, s
233233
For example, let's say you have 4 GPUs: 0, 1, 2 and 3. To run only on the physical GPUs 0 and 2, you can do:
234234

235235
```bash
236-
CUDA_VISIBLE_DEVICES=0,2 python -m torch.distributed.launch trainer-program.py ...
236+
CUDA_VISIBLE_DEVICES=0,2 torchrun trainer-program.py ...
237237
```
238238

239239
So now pytorch will see only 2 GPUs, where your physical GPUs 0 and 2 are mapped to `cuda:0` and `cuda:1` correspondingly.
240240

241241
You can even change their order:
242242

243243
```bash
244-
CUDA_VISIBLE_DEVICES=2,0 python -m torch.distributed.launch trainer-program.py ...
244+
CUDA_VISIBLE_DEVICES=2,0 torchrun trainer-program.py ...
245245
```
246246

247247
Here your physical GPUs 0 and 2 are mapped to `cuda:1` and `cuda:0` correspondingly.
@@ -263,7 +263,7 @@ As with any environment variable you can, of course, export those instead of add
263263

264264
```bash
265265
export CUDA_VISIBLE_DEVICES=0,2
266-
python -m torch.distributed.launch trainer-program.py ...
266+
torchrun trainer-program.py ...
267267
```
268268

269269
but this approach can be confusing since you may forget you set up the environment variable earlier and not understand why the wrong GPUs are used. Therefore, it's a common practice to set the environment variable just for a specific run on the same command line as it's shown in most examples of this section.

docs/source/en/perf_hardware.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ Here is the full benchmark code and outputs:
134134
```bash
135135
# DDP w/ NVLink
136136

137-
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
137+
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 torchrun \
138138
--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path gpt2 \
139139
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train \
140140
--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
@@ -143,7 +143,7 @@ rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch
143143

144144
# DDP w/o NVLink
145145

146-
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 NCCL_P2P_DISABLE=1 python -m torch.distributed.launch \
146+
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 NCCL_P2P_DISABLE=1 torchrun \
147147
--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path gpt2 \
148148
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train
149149
--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200

docs/source/en/perf_train_gpu_many.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ python examples/pytorch/language-modeling/run_clm.py \
153153

154154
```
155155
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 \
156-
python -m torch.distributed.launch --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
156+
torchrun --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
157157
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
158158
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
159159
@@ -164,7 +164,7 @@ python -m torch.distributed.launch --nproc_per_node 2 examples/pytorch/language-
164164

165165
```
166166
rm -r /tmp/test-clm; NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=0,1 \
167-
python -m torch.distributed.launch --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
167+
torchrun --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
168168
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
169169
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
170170

docs/source/en/run_scripts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ The [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) sup
130130
- Set the number of GPUs to use with the `nproc_per_node` argument.
131131

132132
```bash
133-
python -m torch.distributed.launch \
133+
torchrun \
134134
--nproc_per_node 8 pytorch/summarization/run_summarization.py \
135135
--fp16 \
136136
--model_name_or_path t5-small \

docs/source/es/run_scripts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ python examples/tensorflow/summarization/run_summarization.py \
130130
- Establece la cantidad de GPU que se usará con el argumento `nproc_per_node`.
131131

132132
```bash
133-
python -m torch.distributed.launch \
133+
torchrun \
134134
--nproc_per_node 8 pytorch/summarization/run_summarization.py \
135135
--fp16 \
136136
--model_name_or_path t5-small \

docs/source/it/perf_hardware.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ Ecco il codice benchmark completo e gli output:
134134
```bash
135135
# DDP w/ NVLink
136136

137-
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
137+
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 torchrun \
138138
--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path gpt2 \
139139
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train \
140140
--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
@@ -143,7 +143,7 @@ rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch
143143

144144
# DDP w/o NVLink
145145

146-
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 NCCL_P2P_DISABLE=1 python -m torch.distributed.launch \
146+
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 NCCL_P2P_DISABLE=1 torchrun \
147147
--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path gpt2 \
148148
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train
149149
--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200

docs/source/it/run_scripts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ Il [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) supp
130130
- Imposta un numero di GPU da usare con l'argomento `nproc_per_node`.
131131

132132
```bash
133-
python -m torch.distributed.launch \
133+
torchrun \
134134
--nproc_per_node 8 pytorch/summarization/run_summarization.py \
135135
--fp16 \
136136
--model_name_or_path t5-small \

0 commit comments

Comments
 (0)