TensorFlow 多 GPU - NCCL答案

【问题标题】：Tensorflow Multi-GPU - NCCLTensorFlow 多 GPU - NCCL
【发布时间】：2020-07-13 08:33:32
【问题描述】：

我一直想增加批量大小以提高模型的泛化能力（它对批量大小非常敏感）。解决方案是使用多 GPU 以利用更多内存。我在我的脚本中使用 tensorflow.keras（在 Windows 10 上使用 tensorflow 2.1），并按照说明为我的模型配置镜像策略。问题是我的训练脚本在没有镜像策略代码的情况下运行得非常好，但是使用镜像策略时，我收到关于 NCCL 的错误。这看起来与以下问题完全相同：

https://github.com/tensorflow/tensorflow/issues/21470

不幸的是，该链接中讨论的解决方案：

cross_tower_ops = tf.contrib.distribute.AllReduceCrossDeviceOps(
'hierarchical_copy', num_packs=num_gpus))
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)

不适用于 tf 2.1，因为 tf 的“contrib”部分似乎已被删除。有谁知道 Windows 上 NCCL 的替换修复是什么，或者 tf 的“contrib”部分已经消失了？

【问题讨论】：

我实际上没有在 Windows 上。能够让它在 Linux 上运行......但我仍然会喜欢在 Windows 上使用的方法。我在这个问题上开始了赏金，希望这有助于引起人们对这个问题的关注。

标签： python tensorflow

【解决方案1】：

问题 21470 中的一个解决方案是为 Winx64 构建 nccl。 MyCaffe 在此处提供了相关说明：https://github.com/MyCaffe/NCCL/blob/master/INSTALL.md

您需要 VS 2015、2017、CUDA 开发包，并在编译后将生成的 .dll 放在正确的位置。

【讨论】：

亲爱的 .. 使用 tensorflow 进行工作 .. 我的意思是 .. 这将为 windows 构建这样一个库 .. 如果可能，请提供一些细节

【解决方案2】：

根据我的经验，一些cross_device_ops 将不起作用并产生错误。

此选项适用于 NVIDIA DGX-1 架构，可能在其他架构上表现不佳：

strategy = tf.distribute.MirroredStrategy(
    cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())

应该可以：

strategy = tf.distribute.MirroredStrategy(
     cross_device_ops=tf.distribute.ReductionToOneDevice())

不适用于我的配置：

strategy = tf.distribute.MirroredStrategy(
     cross_device_ops=tf.distribute.NcclAllReduce())

因此可以建议尝试不同的选项。

【讨论】：