Tensorflow 在一个 GPU 上支持多个线程/流进行训练？答案

【问题标题】：Tensorflow supports multiple threads/streams on one GPU for training?Tensorflow 在一个 GPU 上支持多个线程/流进行训练？
【发布时间】：2020-06-06 06:38:39
【问题描述】：

更新：

我找到了GPUDevice的源代码，它将最大流数硬编码为1，请问知道原因吗？

GPUDevice(const SessionOptions& options, const string& name, 字节 memory_limit, const DeviceLocality& locality, TfGpuId tf_gpu_id, const string&physical_device_desc, 分配器* gpu_allocator, 分配器* cpu_allocator) : BaseGPUDevice(options, name, memory_limit, locality, tf_gpu_id, physical_device_desc、gpu_allocator、cpu_allocator、 false /* 同步每个操作 */, 1 / max_streams /) { 如果（options.config.has_gpu_options（））{ force_gpu_compatible_ = options.config.gpu_options().force_gpu_compatible(); }

========================================

我想知道 TensorFlow（1.x 版本）是否支持单个 GPU 上的多线程或多流。如果不是，我很好奇根本原因，TF 这样做是出于某些目的还是某些库（如 CUDA）阻止 TF 提供或其他原因？

和之前的一些帖子[1,2]一样，我尝试在 TF 中运行多个训练操作，即 sees.run([train_op1, train_op2],feed_dict={...})，我使用 TF 时间轴进行分析每次迭代。然而，TF 时间线总是显示两个 train ops 顺序运行（虽然时间线不准确[3]，每个 op 的 wall time 表明顺序运行）。我还看了一些 TF 的源代码，看起来每个操作都是在 device->ComputeAsync() 或 device->Compute() 中计算的，并且在计算操作时 GPU 被阻塞。如果我是正确的，一个 GPU 每次只能运行一个操作，这可能会降低 GPU 利用率。

1.Running multiple tensorflow sessions concurrently

2.Run parallel op with different inputs and same placeholder

3.https://github.com/tensorflow/tensorflow/issues/1824#issuecomment-244251867

【问题讨论】：

这个问题可能会引起反对票，因为 SO 是一个 QA 网站，问题应该是针对某个问题的，在这里你有多个问题，其中一些问题可能被解释为意见基于等..更多帮助在这里stackoverflow.com/help/how-to-ask
@NigelSavage 我更新了我的问题，谢谢。

标签： python-3.x multithreading tensorflow gpu

【解决方案1】：

我和你有类似的经历。我有两个GPU，每个GPU运行三个线程，每个线程运行一个会话，每个会话运行时间波动很大。如果在每个 GPU 上只运行一个线程，会话运行时间是相当稳定的。

从这些表现中，我们可以得出结论，tensorflow 中的线程不能很好地协同工作， tensorflow的机制有问题。

【讨论】：