尝试将 gpu 与 tensorflow 一起使用时出错答案

【问题标题】：Error when trying to use gpu with tensorflow尝试将 gpu 与 tensorflow 一起使用时出错
【发布时间】：2017-09-14 15:15:30
【问题描述】：

我正在尝试实现 KNN 机器学习模型，但我无法使用 gpu 设备运行我的代码。我也不能用 CPU 设备运行它，因为我的数据库是一个形状为 [1500,2,1000,6] 的 4D numpy 数组，并且需要很长时间才能完成运行。 CUDA 和 CuDNN 已安装。

我的代码是：

# Placeholders
with tf.device('/gpu:0'):

    x_data_train = tf.placeholder(shape=[1500,2,1000, 6], dtype=tf.float32)
    x_data_test = tf.placeholder(shape=[1500,2,1000, 6], dtype=tf.float32)
    y_target_train = tf.placeholder(shape=[1500,1], dtype=tf.float32)
    y_target_test = tf.placeholder(shape=[1500,1], dtype=tf.float32)

    # Declare distance metric
    # L1
distance = tf.reduce_sum(tf.abs(tf.subtract(x_data_train, tf.expand_dims(x_data_test,1))), axis=2)

# L2
#distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(x_data_train, tf.expand_dims(x_data_test,1))), reduction_indices=1))

# Predict: Get min distance index (Nearest neighbor)
top_k_xvals, top_k_indices = tf.nn.top_k(tf.negative(distance), k=k)
prediction_indices = tf.gather(y_target_train, top_k_indices)
# Predict the mode category
count_of_predictions = tf.reduce_sum(prediction_indices, axis=1)
prediction = tf.argmax(count_of_predictions, axis=1)

# Calculate how many loops over training data
num_loops = int(np.ceil(len(x_vals_test)/batch_size))

test_output = []
actual_vals = []
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)):
    for i in range(num_loops):
        min_index = i*batch_size
        max_index = min((i+1)*batch_size,len(x_vals_train))
        x_batch = x_vals_test[min_index:max_index]
        y_batch = y_vals_test[min_index:max_index]
        predictions = sess.run(prediction, feed_dict={x_data_train: x_vals_train, x_data_test: x_batch,
                                             y_target_train: y_vals_train, y_target_test: y_batch})
        test_output.extend(predictions)
        actual_vals.extend(np.argmax(y_batch, axis=1))

    accuracy = sum([1./test_size for i in range(test_size) if test_output[i]==actual_vals[i]])
    print('Accuracy on test set: ' + str(accuracy))

错误是：

Device mapping: no known devices.
Traceback (most recent call last):
  line 111, in <module>
    y_target_train: y_vals_train, y_target_test: y_batch})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'Placeholder_3': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: Placeholder_3 = Placeholder[dtype=DT_FLOAT, shape=[1500,1], _device="/device:GPU:0"]()]]

Caused by op u'Placeholder_3', defined at:
 line 83, in <module>
    y_target_test = tf.placeholder(shape=[1500,1], dtype=tf.float32)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1530, in placeholder
    return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1954, in _placeholder
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Placeholder_3': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: Placeholder_3 = Placeholder[dtype=DT_FLOAT, shape=[1500,1], _device="/device:GPU:0"]()]]

[Finished in 2.1s with exit code 1]

[
[path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin]

【问题讨论】：

您是否安装了支持 GPU 的 Tensorflow？
是的，它已经安装了。
我解决了这个问题，我再次安装了 cuda 和 cudnn，尽管我确定我安装正确。无论如何它现在都可以使用;)
有时，这些问题没有任何意义。但它很好你让它工作:)

标签： python tensorflow gpu

【解决方案1】：

这看起来像是一个 cuda 安装问题。安装 cuda 后，官方文档中有一些测试，以确保您一切正常。

【讨论】：