【论文笔记 detection】Mixed Precision Training混合精度训练

前言

Mixed Precision Training 是发表在ICLR2018上的文章，截止到写本片文章为止的引用量为190；
相关的参考资料如下：

[paper]
INTRODUCTION TO MIXED PRECISION TRAINING from NIVIDA [pdf]
论文笔记 [url]

文章目录

前言
文章内容
1. What is Mixed Precision Training?

2. Considerations for Mixed Precision （怎样实现混合参数）
3. Application for different Deeplearning Architecture
4. Performance Guidelines

总结

文章内容

1. What is Mixed Precision Training?

Mixed Precision Training 是一种采用混合精度(FP32 & FP16)训练神经网络的方法:

可以对每个网络层或者操作做精度决策，是采取FP32还是FP16；
可以实现对特定任务需要保持精确度的情况采取高精度(FP32)计算；
可以实现对需要速度和内存限制的情况采取低精度(FP16)计算；

采用Mixed Precision Training的好处有：

加速数学计算（FP16比FP32计算快8倍）
计算内存带宽压力（FP16比FP32的traffic pressure减半）
降低内存消耗（FP16占用的内存消耗是FP32的一半）

为什么要采用Mixed Precision Training？
根本原因是采用单纯的FP16不能满足某些网络训练情况，需要FP32确保某些数的精度不会超出溢出

Weighted update （优化器更新歩幅小，延迟更新需要的精度高）
reductions （大整数加，小浮点数计算）

【论文笔记 detection】Mixed Precision Training混合精度训练
相比FP32，MixedPrecision快了3X；精确率保持一致；而且没有参数量的改变。

2. Considerations for Mixed Precision （怎样实现混合参数）

实现混合精度训练主要有一下三部分：

Precision of OPS（决策当前操作是采用FP6还是FP32）
Master Weights (始终复制一份FP32的网络权重)
由于在weight update的时候，参数计算的时候FP16不能满足精度需求，所以作者提出FP32Master分支，将小步幅的更新在FP6分支操作，最后合并到FP32主分支
Loss Scaling（规范化损失值避免低精度梯度传导被忽视的情况）

3. Application for different Deeplearning Architecture

PyTorch

For TensorFlow: https://docs.nvidia.com/deeplearning/frameworks/tensorflow-user-guide/index.html#tfamp
For PyTorch: https://nvidia.github.io/apex/amp.html
For MXNet: https://mxnet.apache.org/api/python/docs/tutorials/performance/backend/amp.html
AMP Examples: https://github.com/NVIDIA/DeepLearningExamples

4. Performance Guidelines

【论文笔记 detection】Mixed Precision Training混合精度训练

总结

在这篇文章中，首先提及了Mixed Precision training的概念，什么是mixed precision，mixed precision的好处，怎么做mixed precision，为什么要做mixed precision；其次讲述了实现mixed precision的实现主要依靠precision of OPS,master of weights,以及loss scale三个模块组成；最后讲述了不同深度学习框架使用mixed precision的方法以及mixed precision在不同应用上的效果。
下一步将阅读更多类似的训练方法，并做总结综述，最后完成一份总结训练技巧的list。