cuda gdb：指示的内核不在代码中答案

【问题标题】：cuda gdb: the kernel indicated is not in the codecuda gdb：指示的内核不在代码中
【发布时间】：2012-08-30 19:34:14
【问题描述】：

我最初的问题是，我的函数有很长的参数列表，超出了允许作为参数传递给 cuda 内核的内存（我不记得有多少字节，因为它是而自从我处理了那个）。所以，我绕过这个问题的方法是定义一个新结构，它的成员是指向其他结构的指针，我以后可以从内核中取消引用。

...这是当前问题开始的地方：在我试图从内核中取消引用指针（我之前创建的结构的成员）时，我得到CUDA_EXCEPTION_5, Warp Out-of-range Address ...来自 cuda-gdb。最重要的是，内核名称和参数（报告为“此时不存在”，cuda-gdb 给出的错误消息不是我在代码中创建的。

现在，了解更多细节：

这里是涉及的结构：

typedef struct {

    int strx;
    int stry;
    int strz;
    float* el;

} manmat;

typedef struct {

    manmat *x;
    manmat *y;
    manmat *z;

} manmatvec;

这就是我尝试在 main 中对内核参数进行分组的方式：

int main () {

...
...

    manmat resu0;
    resu0.strx = n+2;       resu0.stry = m+2;       resu0.strz = l+2;
    if (cudaMalloc((void**)&resu0.el,sizeof(float) * (n+2)*(m+2)*(l+2)) != cudaSuccess) cout << endl << " ERROR allocating memory for manmat resu0" << endl ;
    manmat resv0;
    resv0.strx = n+2;       resv0.stry = m+2;       resv0.strz = l+2;
    if (cudaMalloc((void**)&resv0.el,sizeof(float) * (n+2)*(m+2)*(l+2)) != cudaSuccess) cout << endl << " ERROR allocating memory for manmat resv0" << endl ;
    manmat resw0;
    resw0.strx = n+2;       resw0.stry = m+2;       resw0.strz = l+2;
    if (cudaMalloc((void**)&resw0.el,sizeof(float) * (n+2)*(m+2)*(l+2)) != cudaSuccess) cout << endl << " ERROR allocating memory for manmat resw0" << endl ;
    manmatvec residues0 ;

    residues0.x = &resu0;
    residues0.y = &resv0;
    residues0.z = &resw0;

    exec_res_std_2d <<<numBlocks2D, threadsPerBlock2D>>> (residues0, ......) ;

 .....
}

...这就是内核中发生的事情：

__global__ void exec_res_std_2d (manmatvec residues, ......) {

    int i = blockIdx.x * blockDim.x + threadIdx.x;
    int k = blockIdx.y * blockDim.y + threadIdx.y;

    manmat *resup;
    manmat *resvp;
    manmat *reswp;

    resup = residues.x;
    resvp = residues.y;
    reswp = residues.z;

    manmat resu, resv, resw ;

    resu.strx = (*resup).strx;     //LINE 1626
    resu.stry = (*resup).stry;
    resu.strz = (*resup).strz;
    resu.el = (*resup).el;

    resv = *resvp;
    resw = *reswp;

    .....
}

最后，这就是 cuda-gdb 给出的输出：

..................
[Launch of CUDA Kernel 1065 (exec_res_std_2d<<<(1,2,1),(32,16,1)>>>) on Device 0]
[Launch of CUDA Kernel 1066 (exec_res_bot_2d<<<(1,2,1),(32,16,1)>>>) on Device 0]

Program received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.
[Switching focus to CUDA kernel 1065, grid 1066, block (0,0,0), thread (0,2,0), device 0, sm 0, warp 2, lane 0]
0x0000000003179020 in fdivide<<<(1,2,1),(32,16,1)>>> (a=warning: Variable is not live at this point. Value is undetermined.
..., pt=warning: Variable is not live at this point. Value is undetermined.
..., cells=warning: Variable is not live at this point. Value is undetermined.
...) at ola.cu:1626
1626    ola.cu: No such file or directory.
    in ola.cu

我必须注意，我没有在名为 fdivide 的代码中定义任何函数 __device__ 或 __global__.....

另外，重要的是要说，在调试器内的程序运行开始时，尽管我使用 -arch=sm_20 -g -G -gencode arch=compute_20,code=sm_20 编译了我的 cuda c 文件，但我明白了，

[New Thread 0x7ffff3b69700 (LWP 12465)]
[Context Create of context 0x1292340 on Device 0]
warning: no loadable sections found in added symbol-file /tmp/cuda-dbg/12456/session1/elf.1292340.1619c10.o.LkkWns
warning: no loadable sections found in added symbol-file /tmp/cuda-dbg/12456/session1/elf.1292340.1940ad0.o.aHtC7W
warning: no loadable sections found in added symbol-file /tmp/cuda-dbg/12456/session1/elf.1292340.2745680.o.bVXEWl
warning: no loadable sections found in added symbol-file /tmp/cuda-dbg/12456/session1/elf.1292340.2c438b0.o.cgUqiP
warning: no loadable sections found in added symbol-file /tmp/cuda-dbg/12456/session1/elf.1292340.2c43980.o.4diaQ4
warning: no loadable sections found in added symbol-file /tmp/cuda-dbg/12456/session1/elf.1292340.2dc9380.o.YYJAr5

非常欢迎任何可以帮助我解决此问题的答案或提示或建议！请注意，我最近才开始使用 cuda-c 进行编程，而且我对 cuda-gdb 不是很有经验。我在 C 代码中进行的大部分调试都是通过检查代码各个点的输出来“手动”进行的....

另外，这段代码运行在 tesla M2090 上，也被编译为运行在 2.0 架构上。

【问题讨论】：

标签： exception pointers cuda cuda-gdb

【解决方案1】：

这将是一个问题：

manmatvec residues0 ;

    residues0.x = &resu0;
    residues0.y = &resv0;
    residues0.z = &resw0;

resu0、resv0 和 resw0 变量分配在主机内存中 - 在主机堆栈上。您将主机地址放入 manmatvec 结构中，然后将 manmatvec 传递到内核中。在接收端，CUDA 代码无法访问结构体中提供的主机内存地址。

如果要传递resu0、resv0、resw0 变量的地址，则需要从设备内存中分配它们。

我不知道这是否是整个问题，但我很确定这是一个顶级贡献者。

【讨论】：

我理解你的论点，你说得对，这些都是在主机端分配的。但是它们存在的全部意义在于我使用它们来访问和操作设备端的内存（指针成员指向设备内存）所以我想我的下一个问题是如何从设备端分配它们并让它们打开全局内存，以便其他内核可以使用它们......？再次感谢您的回复
不只是在主机端分配。残留变量驻留在主机内存中。设备无法访问主机内存。这就是您在第 1626 行遇到异常的原因 - 设备试图取消引用包含主机内存地址的指针。您需要在主机端的设备内存中分配剩余变量。这使得在主机端设置它们的值更加困难。
(1) 计算设备内存大小 3 * (sizeof(manmat) + (sizeof(float) * ((n + 2) + (m + 3) + (l + 2))) .注意：我不知道你为什么在这个计算中做'*'。（2）malloc一块主机内存和cudaMalloc一块设备内存。（3）为每个子内存分配投射主机指针和填写值。设置 manmat.el 字段时使用 deviceptr（不是主机指针）。（4）从主机到设备的 cudaMemcpy。（5）用 deviceptrs 填写 manmatvec。如果你的大小计算是（n+m+l+ 6)*4且总和小于4K，则可以通过参数传递所有这些。