在 CUDA 内核模板函数中，如何测试类型？答案

【问题标题】：In CUDA kernel template function, how to test types?在 CUDA 内核模板函数中，如何测试类型？
【发布时间】：2021-10-16 21:11:58
【问题描述】：

我有一个像这样的 CUDA 内核模板函数：

    template <typename scalar_t, typename accscalar_t, typename index_type, int indexing_kind>
    __global__ void lstm_cell_forward(
                TensorInfo<scalar_t, index_type> input,
                TensorInfo<scalar_t, index_type> hidden,
                TensorInfo<scalar_t, index_type> bias1,
                TensorInfo<scalar_t, index_type> bias2,
                TensorInfo<scalar_t, index_type> _cx,
                TensorInfo<scalar_t, index_type> _hy,
                TensorInfo<scalar_t, index_type> _cy,
                TensorInfo<scalar_t, index_type> workspace,
                index_type hsz,
                index_type totalElements) {
...
 scalar_t iig = DEVICE_LINEAR_GET(input, offset+0*hsz);
      scalar_t ifg = DEVICE_LINEAR_GET(input, offset+1*hsz);
      scalar_t icg = DEVICE_LINEAR_GET(input, offset+2*hsz);
      scalar_t iog = DEVICE_LINEAR_GET(input, offset+3*hsz);
...
}

我想在这个内核函数中添加 printf() 来打印 iig, ifg, icg, iog 的值，仅当 scalar_t 为浮点数时。我尝试使用typeid(float) == typeid(iig) 来完成此操作，但 CUDA 代码似乎不支持“typeinfo.h”。

如果我只是省略 if 语句

   printf("iig = %f, ifg = %f, icg = %f, iog = %f\n",
    iig, ifg, icg, iog);

然后继续编译，编译器将尝试编译模板参数类型的不同组合，即使是 c10:Half 类型，一种 pytorch 定义的数据类型。这将引发错误。

所以我的问题是如何写scalar_t的比较来检查它是否等于float？

我采纳了@Robert Crovella 回答的建议，结果报错：

liwei.dai@854380cd7bb1:~/tests/cuda-debug-case/cuda/tmp$ !b
bash bi_make_and_run.sh 
test.cu:7:20: error: no template named 'is_same_v' in namespace 'cuda::std'; did you mean 'is_same'?
  if (::cuda::std::is_same_v<T, float>) printf("val is a float: %f\n", val);
      ~~~~~~~~~~~~~^~~~~~~~~
                   is_same
/opt/sw_home/local/cuda/include/cuda/std/std/detail/libcxx/include/type_traits:877:65: note: 'is_same' declared here
template <class _Tp, class _Up> struct _LIBCUDACXX_TEMPLATE_VIS is_same           : public false_type {};
                                                                ^
test.cu:7:39: error: expected unqualified-id
  if (::cuda::std::is_same_v<T, float>) printf("val is a float: %f\n", val);

但是，如果我尝试使用cuda::std::is_same<T, float>::value，它就起作用了。我检查了 type_traits 文件中的源代码，他们只是使用 is_same_v 来调用 is_same::value。但我不知道为什么。

【问题讨论】：

标签： c++ cuda

【解决方案1】：

所以我的问题是如何编写scalar_t的比较来检查它是否等于float？

我相信您可以为此使用libcu++（类型特征）的功能：

$ cat t1868.cu
#include <cuda/std/type_traits>
#include <cstdio>
template <typename T>
__global__ void k(T val){
  if (::cuda::std::is_same_v<T, float>) printf("val is a float: %f\n", val);
  else printf("val is not a float\n");
}

int main(){

  float f = 1.2345;
  k<<<1,1>>>(f);
  double d = 1.2345;
  k<<<1,1>>>(d);
  cudaDeviceSynchronize();
}

$ nvcc -std=c++14 t1868.cu -o t1868
$ cuda-memcheck ./t1868
========= CUDA-MEMCHECK
val is a float: 1.234500
val is not a float
========= ERROR SUMMARY: 0 errors
$

【讨论】：

我使用 CUDA8。我刚刚修改了我的答案以包含问题所在。
这至少是问题的一部分。直到 CUDA 10.2 时间框架，libcu++ 库才开始在 CUDA 工具包中成为 included。
嗨，只是好奇 libcu++ 中有一个 API 用于打印出变量类型。例如，我使用了 auto tmp = someDeviceFunction();我想知道 tmp 的类型是什么，浮点数或双精度数之类的？