You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/amp.md
+19-16Lines changed: 19 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,24 +1,24 @@
1
-
# Mixed Precision Training
1
+
# Mixed precision training
2
2
3
-
In Colossal-AI, we have integrated different implementations of mixed precision training:
3
+
In ColossalAI, we have incorporated different implementations of mixed precision training:
4
4
1. torch.cuda.amp
5
5
2. apex.amp
6
6
3. tensor-parallel amp
7
7
8
8
The first two rely on the original implementation of [PyTorch](https://pytorch.org/docs/stable/amp.html)
9
9
(version 1.6 and above) and [Nvidia Apex](https://github.com/NVIDIA/apex). However, these two methods are not compatible
10
-
with tensor parallelism. This is because that tensors are split across devices in tensor parallelism, thus, it is needed
11
-
to communicate among different processes to check if inf or nan occurs throughout the whole model weights. For the mixed
12
-
precision training with tensor parallel, we adapted this feature from [Megatron-LM](https://github.com/NVIDIA/Megatron-LM).
10
+
with tensor parallelism. This is because that tensors are split across devices in tensor parallelism, thus, it is required
11
+
to communicate among different processes to check if `inf` or `nan` occurs in the whole model weights. For the mixed
12
+
precision training with tensor parallelism, we adapted this feature from [Megatron-LM](https://github.com/NVIDIA/Megatron-LM).
13
13
14
-
To use mixed precision training, you can easily specify the `fp16` field in the configuration file. Currently, torch and
15
-
apex amp cannot be guaranteed to work with tensor and pipeline parallelism, thus, only the last one is recommended if you
14
+
To use mixed precision training, you can easily specify the `fp16` field in the config file to be True. Currently, PyTorch and
15
+
Apex amp cannot be guaranteed to work with tensor and pipeline parallelism, thus, only the last one is recommended if you
16
16
are using hybrid parallelism.
17
17
18
-
## Torch AMP
18
+
## PyTorch AMP
19
19
20
-
PyTorch provides mixed precision training in version 1.6 and above. It provides an easy way to cast data to fp16 format
21
-
while keeping some operations such as reductions in fp32. You can configure the gradient scaler in the configuration.
20
+
PyTorch provides mixed precision training in version 1.6 and above. It provides an easy way to cast data to `fp16` format
21
+
while keeping some operations such as reductions in `fp32`. You can configure the gradient scaler in the config file.
22
22
23
23
```python
24
24
from colossalai.engine importAMP_TYPE
@@ -34,13 +34,14 @@ fp16=dict(
34
34
)
35
35
```
36
36
37
-
38
37
## Apex AMP
39
38
40
-
For this mode, we rely on the [Apex](https://nvidia.github.io/apex/) implementation for mixed precision training. We supported this plugin because it allows
41
-
for finer control on the granularity of mixed precision. For example, `O2` level (optimization level 2) will keep batch normalization in fp32.
39
+
For this mode, we rely on the [Apex](https://nvidia.github.io/apex/) implementation for mixed precision training. We support
40
+
this plugin because it allows for finer control on the granularity of mixed precision. For example, `O2` level (optimization level 2)
41
+
will keep batch normalization in `fp32`.
42
+
43
+
The following code block shows a config file for Apex AMP.
42
44
43
-
The configuration is like below.
44
45
```python
45
46
from colossalai.engine importAMP_TYPE
46
47
@@ -64,8 +65,10 @@ fp16 = dict(
64
65
65
66
## Tensor Parallel AMP
66
67
67
-
We leveraged the Megatron-LM implementation to achieve mixed precision training while maintaining compatibility with
68
-
complex tensor and pipeline parallel.
68
+
We leveraged the Megatron-LM implementation to achieve mixed precision training while maintaining compatibility with complex tensor
69
+
and pipeline parallelism.
70
+
71
+
The following conde block show a config file for this mode.
0 commit comments