Skip to content

Commit 9cf2746

Browse files
pacman100sguggersaattrupdan
authored
mac m1 mps integration (#18598)
* mac m1 `mps` integration * Update docs/source/en/main_classes/trainer.mdx Co-authored-by: Sylvain Gugger <[email protected]> * addressing comments * Apply suggestions from code review Co-authored-by: Dan Saattrup Nielsen <[email protected]> * resolve comment Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Dan Saattrup Nielsen <[email protected]>
1 parent d6eeb87 commit 9cf2746

File tree

2 files changed

+103
-10
lines changed

2 files changed

+103
-10
lines changed

docs/source/en/main_classes/trainer.mdx

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -591,6 +591,66 @@ More details in this [issues](https://github.com/pytorch/pytorch/issues/75676).
591591
More details mentioned in this [issue](https://github.com/pytorch/pytorch/issues/76501)
592592
(`The original model parameters' .grads are not set, meaning that they cannot be optimized separately (which is why we cannot support multiple parameter groups)`).
593593

594+
### Using Trainer for accelerated PyTorch Training on Mac
595+
596+
With PyTorch v1.12 release, developers and researchers can take advantage of Apple silicon GPUs for significantly faster model training.
597+
This unlocks the ability to perform machine learning workflows like prototyping and fine-tuning locally, right on Mac.
598+
Apple's Metal Performance Shaders (MPS) as a backend for PyTorch enables this and can be used via the new `"mps"` device.
599+
This will map computational graphs and primitives on the MPS Graph framework and tuned kernels provided by MPS.
600+
For more information please refer official documents [Introducing Accelerated PyTorch Training on Mac](https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/)
601+
and [MPS BACKEND](https://pytorch.org/docs/stable/notes/mps.html).
602+
603+
<Tip warning={false}>
604+
605+
We strongly recommend to install PyTorch >= 1.13 (nightly version at the time of writing) on your MacOS machine.
606+
It has major fixes related to model correctness and performance improvements for transformer based models.
607+
Please refer to https://github.com/pytorch/pytorch/issues/82707 for more details.
608+
609+
</Tip>
610+
611+
**Benefits of Training and Inference using Apple Silicon Chips**
612+
613+
1. Enables users to train larger networks or batch sizes locally
614+
2. Reduces data retrieval latency and provides the GPU with direct access to the full memory store due to unified memory architecture.
615+
Therefore, improving end-to-end performance.
616+
3. Reduces costs associated with cloud-based development or the need for additional local GPUs.
617+
618+
**Pre-requisites**: To install torch with mps support,
619+
please follow this nice medium article [GPU-Acceleration Comes to PyTorch on M1 Macs](https://medium.com/towards-data-science/gpu-acceleration-comes-to-pytorch-on-m1-macs-195c399efcc1).
620+
621+
**Usage**:
622+
User has to just pass `--use_mps_device` argument.
623+
For example, you can run the offical Glue text classififcation task (from the root folder) using Apple Silicon GPU with below command:
624+
625+
```bash
626+
export TASK_NAME=mrpc
627+
628+
python examples/pytorch/text-classification/run_glue.py \
629+
--model_name_or_path bert-base-cased \
630+
--task_name $TASK_NAME \
631+
--do_train \
632+
--do_eval \
633+
--max_seq_length 128 \
634+
--per_device_train_batch_size 32 \
635+
--learning_rate 2e-5 \
636+
--num_train_epochs 3 \
637+
--output_dir /tmp/$TASK_NAME/ \
638+
--use_mps_device \
639+
--overwrite_output_dir
640+
```
641+
642+
**A few caveats to be aware of**
643+
644+
1. Some PyTorch operations have not been implemented in mps and will throw an error.
645+
One way to get around that is to set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1`,
646+
which will fallback to CPU for these operations. It still throws a UserWarning however.
647+
2. Distributed setups `gloo` and `nccl` are not working with `mps` device.
648+
This means that currently only single GPU of `mps` device type can be used.
649+
650+
Finally, please, remember that, 🤗 `Trainer` only integrates MPS backend, therefore if you
651+
have any problems or questions with regards to MPS backend usage, please,
652+
file an issue with [PyTorch GitHub](https://github.com/pytorch/pytorch/issues).
653+
594654
Sections that were moved:
595655

596656
[ <a href="./deepspeed#deepspeed-trainer-integration">DeepSpeed</a><a id="deepspeed"></a>

src/transformers/training_args.py

Lines changed: 43 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@
2222
from pathlib import Path
2323
from typing import Any, Dict, List, Optional, Union
2424

25+
from packaging import version
26+
2527
from .debug_utils import DebugOption
2628
from .trainer_utils import (
2729
EvaluationStrategy,
@@ -478,6 +480,8 @@ class TrainingArguments:
478480
are also available. See the [Ray documentation](
479481
https://docs.ray.io/en/latest/tune/api_docs/analysis.html#ray.tune.ExperimentAnalysis.get_best_trial) for
480482
more options.
483+
use_mps_device (`bool`, *optional*, defaults to `False`):
484+
Whether to use Apple Silicon chip based `mps` device.
481485
"""
482486

483487
output_dir: str = field(
@@ -630,6 +634,9 @@ class TrainingArguments:
630634
},
631635
)
632636
no_cuda: bool = field(default=False, metadata={"help": "Do not use CUDA even when it is available"})
637+
use_mps_device: bool = field(
638+
default=False, metadata={"help": "Whether to use Apple Silicon chip based `mps` device."}
639+
)
633640
seed: int = field(default=42, metadata={"help": "Random seed that will be set at the beginning of training."})
634641
data_seed: Optional[int] = field(default=None, metadata={"help": "Random seed to be used with data samplers."})
635642
jit_mode_eval: bool = field(
@@ -1368,16 +1375,42 @@ def _setup_devices(self) -> "torch.device":
13681375
device = torch.device("cuda", self.local_rank)
13691376
self._n_gpu = 1
13701377
elif self.local_rank == -1:
1371-
# if n_gpu is > 1 we'll use nn.DataParallel.
1372-
# If you only want to use a specific subset of GPUs use `CUDA_VISIBLE_DEVICES=0`
1373-
# Explicitly set CUDA to the first (index 0) CUDA device, otherwise `set_device` will
1374-
# trigger an error that a device index is missing. Index 0 takes into account the
1375-
# GPUs available in the environment, so `CUDA_VISIBLE_DEVICES=1,2` with `cuda:0`
1376-
# will use the first GPU in that env, i.e. GPU#1
1377-
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
1378-
# Sometimes the line in the postinit has not been run before we end up here, so just checking we're not at
1379-
# the default value.
1380-
self._n_gpu = torch.cuda.device_count()
1378+
if self.use_mps_device:
1379+
if not torch.backends.mps.is_available():
1380+
if not torch.backends.mps.is_built():
1381+
raise AssertionError(
1382+
"MPS not available because the current PyTorch install was not "
1383+
"built with MPS enabled. Please install torch version >=1.12.0 on "
1384+
"your Apple silicon Mac running macOS 12.3 or later with a native "
1385+
"version (arm64) of Python"
1386+
)
1387+
else:
1388+
raise AssertionError(
1389+
"MPS not available because the current MacOS version is not 12.3+ "
1390+
"and/or you do not have an MPS-enabled device on this machine."
1391+
)
1392+
else:
1393+
if not version.parse(version.parse(torch.__version__).base_version) > version.parse("1.12.0"):
1394+
warnings.warn(
1395+
"We strongly recommend to install PyTorch >= 1.13 (nightly version at the time of writing)"
1396+
" on your MacOS machine. It has major fixes related to model correctness and performance"
1397+
" improvements for transformer based models. Please refer to"
1398+
" https://github.com/pytorch/pytorch/issues/82707 for more details."
1399+
)
1400+
device = torch.device("mps")
1401+
self._n_gpu = 1
1402+
1403+
else:
1404+
# if n_gpu is > 1 we'll use nn.DataParallel.
1405+
# If you only want to use a specific subset of GPUs use `CUDA_VISIBLE_DEVICES=0`
1406+
# Explicitly set CUDA to the first (index 0) CUDA device, otherwise `set_device` will
1407+
# trigger an error that a device index is missing. Index 0 takes into account the
1408+
# GPUs available in the environment, so `CUDA_VISIBLE_DEVICES=1,2` with `cuda:0`
1409+
# will use the first GPU in that env, i.e. GPU#1
1410+
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
1411+
# Sometimes the line in the postinit has not been run before we end up here, so just checking we're not at
1412+
# the default value.
1413+
self._n_gpu = torch.cuda.device_count()
13811414
else:
13821415
# Here, we'll use torch.distributed.
13831416
# Initializes the distributed backend which will take care of synchronizing nodes/GPUs

0 commit comments

Comments
 (0)