Mixed Precision Training

It seems that currently there is no fp16 support in AllenNLP, but I think it is quite straightforward to do in a clean way with [Apex](https://github.com/NVIDIA/apex).

Advantages of FP16:
1. Larger batch size, especially useful for large embedding models like ELMO and BERT
2. Faster training: the latest GPUs (P100, V100, RTX) can achieve 60-80% speed up 
3. Many experiments have showed that FP16 does not hurt accuracy for CV models.

Challenge:
Some functions in AllenNLP are not FP16 stable, such as `masked_softmax`

 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mixed Precision Training #2149

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mixed Precision Training #2149

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions