为什么 Tensorflow 找不到我自定义 Op 的 GPU 内核？答案

【问题标题】：Why can't Tensorflow find the GPU kernel of my custom Op?为什么 Tensorflow 找不到我自定义 Op 的 GPU 内核？
【发布时间】：2018-07-11 03:38:37
【问题描述】：

我已按照 Tensorflow 网站上的 Adding a New Op 示例将 GPU 内核添加到我的自定义操作。它编译得很好，但是当我尝试使用 GPU 内核时，Tensorflow 给了我这个错误：

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'b': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.

~~谁能指出我哪里出错了？~~我正在使用上面链接页面上给出的确切示例代码，并进行了以下修复/更改（可在其他各种论坛和 StackOverflow 页面上找到):

已将#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor" 添加到kernel_example.cu.cc
在kernel_example.cu.cc 和kernel_example.cc 中将"example.h" 更改为"kernel_example.h"
在kernel_example.h 中的#endif // KERNEL_EXAMPLE_H_ 行中添加了//
将kernel_example.h 中的部分特化更正为：

template <typename T> struct ExampleFunctor<Eigen::GpuDevice, T> { ... };
在kernel_example.cc注册了操作：

REGISTER_OP("Example").Attr("T: {float, int32} = DT_FLOAT").Input("input: T").Output("output: T"));
将.cu.cc 文件的最后几行更正为template struct ExampleFunctor<GPUDevice, float>;（缺少struct）

生成文件：

TF_LIB := $(shell python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())' 2>/dev/null)
TF_INC := $(shell python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())' 2>/dev/null)

CUDA_LIB=/z/sw/packages/cuda/8.0/lib64

all: kernel_example.cu.cc kernel_example.cc
    nvcc -std=c++11 -c -o kernel_example.cu.o kernel_example.cu.cc -I $(TF_INC) -I$(TF_INC)/external/nsync/public -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -D_MWAITXINTRIN_H_INCLUDED --expt-relaxed-constexpr

    g++ -std=c++11 -shared -o kernel_example.so kernel_example.cc kernel_example.cu.o -I $(TF_INC) -I$(TF_INC)/external/nsync/public -fPIC -L$(CUDA_LIB) -lcudart -L$(TF_LIB) -D_GLIBCXX_USE_CXX11_ABI=0 -D GOOGLE_CUDA=1

编辑：正如我在下面的回答中指出的那样，问题是由g++ 调用缺少-D GOOGLE_CUDA=1 引起的。

测试代码：

import tensorflow as tf
example_lib = tf.load_op_library('kernel_example.so')

with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], name='a')
    b = example_lib.example(a, name='b')
sess = tf.Session()
print(sess.run(b))

【问题讨论】：

您很可能没有为您的 GPU 架构编译 cuda 代码
@talonmies 你能详细说明一下吗？问题中给出了我的 Makefile；还有其他方法可以编译代码吗？
if 这确实是 Tensorflow 报告的底层 CUDA API 错误，那么您可能需要添加一个 arch=... 以匹配您的 GPU 与 nvcc 状态的关系。但我不能 100% 确定 Tensorflow 报告的内容实际上是 CUDA 错误
好的，所以这不是 CUDA 运行时错误，尽管它的措辞几乎完全一样
不，但是您的评论确实让我找到了正确的方法来找出我在路上发现的错误。谢谢！ :)

标签： c++ tensorflow cuda

【解决方案1】：

Tensorflow 文档像往常一样糟糕且不完整。 g++ 命令缺少 -D GOOGLE_CUDA=1。我将编辑问题以反映这一点。

【讨论】：