TensorFlow 如何使用我的 GPU？答案

【问题标题】：How Tensorflow uses my gpu?TensorFlow 如何使用我的 GPU？
【发布时间】：2018-08-14 00:55:03
【问题描述】：

所以我在 tensorflow 中训练一个 NN，同时我正在监控我的 GPU 负载。

从截图中我看到 Tensorflow 基本上只使用 GPU 内存，这正常吗？我认为他们利用我所有的 cuda 核心来执行一些计算等。

有没有人详细了解这些东西？

提前致谢！

代码来了...

import tensorflow as tf
tf.logging.set_verbosity(tf.logging.INFO)

# ... some file reading here 

def train_input_fn(features, labels, batch_size):
    return tf.estimator.inputs.pandas_input_fn(
        x = features,
        y = labels,
        num_epochs = 1,
        shuffle = True,
        batch_size = batch_size)

def eval_input_fn(features, labels):
    return tf.estimator.inputs.pandas_input_fn(
        x = features,
        y = labels,
        num_epochs = 1,
        shuffle = True)

def pred_input_fn(features):
    return tf.estimator.inputs.pandas_input_fn(
        x = features,
        num_epochs = 1,
        shuffle = False)

model_dir = './DNN_Linear_Combined_Regressor'

file_writer = tf.summary.FileWriter(model_dir)

estimator = tf.estimator.DNNLinearCombinedRegressor(
    model_dir = model_dir,
    linear_feature_columns = wide_columns,
    dnn_feature_columns = deep_columns,
    dnn_optimizer = tf.train.AdamOptimizer(learning_rate=0.001),
    dnn_hidden_units = [64,64,64,8], 
    batch_norm = True,
    dnn_dropout = 0.1
)

train_spec = tf.estimator.TrainSpec(input_fn = train_input_fn(train, y_train, batch_size=5000))
eval_spec = tf.estimator.EvalSpec(input_fn = eval_input_fn(valid, y_valid))

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

【问题讨论】：

您应该发布您的代码，以便我们查看您是否做错了什么。 TensorFlow 应该同时使用 GPU 内存和计算。
可以参考编辑
尝试更改批量大小 (stackoverflow.com/questions/52159053/…)

标签： python tensorflow gpu

【解决方案1】：

根据 TensorFlow 文档here，默认情况下，TensorFlow 将使用与您的 GPU 内存一样多的内存。

另外，您可以使用以下代码检查终端中哪个张量计算正在使用哪个设备：

# Creates an estimator with log_device_placement set to True.
sess_config = tf.ConfigProto(log_device_placement=True)
run_config = tf.estimator.RunConfig(session_config = sess_config)
your_classifier = tf.estimator.Estimator(config=run_config)

你会看到这样的东西：

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/device:GPU:0
a: /job:localhost/replica:0/task:0/device:GPU:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0

GPU:0 是您的默认 GPU。

【讨论】：

这很好，我认为它默认使用 100% 的 GPU RAM 没有问题，我的问题是为什么它不使用我的 GPU 计算能力
自从您使用 tf.estimator 以来，我已经编辑了我的代码。通过日志，您可以检查 TensorFlow 的 gpu 使用情况。您可以根据日志调试程序。如果一切正常，则 GPU 使用情况正常。
只是一个小问题...... gpu_options 来自评论下方的第一行？
我这样做了：sess_config = tf.ConfigProto(log_device_placement=True) 现在我可以看到每个操作的去向。谢谢大佬！

【解决方案2】：

Cuda 是一个允许您在 GPU 上运行 C/C++ 代码的平台。 Cuda 核心是 GPU 的一部分。所以它使用 Cuda 核心进行计算。

Tensorflow 使用 GPU 是正常的。与 CPU 相比，矩阵加法、乘法等特殊运算可以在 GPU 上非常高效地执行（因为并行性和硬件加速）。

【讨论】：

我很困惑，如果你在这里查看 geforce.com/hardware/notebook-gpus/geforce-gtx-770m/… 它说我的 GPU 有 960 个 CUDA 核心 ...