【发布时间】:2013-10-22 10:44:33
【问题描述】:
我是 GPU 和 CUDA 编程的新手。我正在尝试将设备上动态分配的结构化数据从设备复制到主机。我从 GPU 编程指南中修改了一个简单的代码。编译代码时我没有收到任何错误,但我唯一有问题的是输出错误,即“0”。代码如下:
#include <stdlib.h>
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
typedef struct Point
{
int2 pt;
};
#define NUMOFBLOCKS 1
#define THREDSPERBLOCK 16
__device__ Point* pnt[NUMOFBLOCKS];
Point dataptr_h[NUMOFBLOCKS][THREDSPERBLOCK];
__global__ void allocmem()
{
if (threadIdx.x == 0)
pnt[blockIdx.x] = (Point*)malloc(1*blockDim.x * sizeof(Point));
__syncthreads();
}
__global__ void usemem()
{
Point* ptr = pnt[blockIdx.x];
if (ptr != NULL)
{
ptr[threadIdx.x].pt.x = threadIdx.x;
ptr[threadIdx.x].pt.y = threadIdx.x;
printf("Ptr = %d\t", ptr[threadIdx.x].pt.x);
}
}
__global__ void freemem()
{
Point* ptr = pnt[blockIdx.x];
if (ptr != NULL)
printf("Block %d, Thread %d: final value = %d\n", blockIdx.x, threadIdx.x, ptr[threadIdx.x]);
if (threadIdx.x == 0)
free(ptr);
}
int main()
{
Point* d_pt[NUMOFBLOCKS];
for (int i = 0 ; i < NUMOFBLOCKS; i++)
cudaMalloc(&d_pt[i], sizeof(Point)*16);
// Allocate memory
allocmem<<< NUMOFBLOCKS, THREDSPERBLOCK >>>();
// Use memory
usemem<<< NUMOFBLOCKS, THREDSPERBLOCK >>>();
cudaMemcpyFromSymbol(d_pt, pnt, sizeof(d_pt));
cudaMemcpy(dataptr_h, d_pt, sizeof(dataptr_h), cudaMemcpyDeviceToHost);
for (int j = 0 ; j < 1; j++)
for (int i = 0 ; i < 16; i++)
{
printf("\nPtr_h(%d,%d)->X = %d\t", j, i, dataptr_h[j][i].pt.x);
printf("Ptr_h(%d,%d)->Y = %d", j, i, dataptr_h[j][i].pt.y);
}
freemem<<< NUMOFBLOCKS, THREDSPERBLOCK >>>();
cudaDeviceSynchronize();
return 0;
}
代码的输出是:
Ptr_h(0,0)->X = 0 Ptr_h(0,0)->Y = 0
Ptr_h(0,1)->X = 0 Ptr_h(0,1)->Y = 0
Ptr_h(0,2)->X = 0 Ptr_h(0,2)->Y = 0
Ptr_h(0,3)->X = 0 Ptr_h(0,3)->Y = 0
Ptr_h(0,4)->X = 0 Ptr_h(0,4)->Y = 0
Ptr_h(0,5)->X = 0 Ptr_h(0,5)->Y = 0
Ptr_h(0,6)->X = 0 Ptr_h(0,6)->Y = 0
Ptr_h(0,7)->X = 0 Ptr_h(0,7)->Y = 0
Ptr_h(0,8)->X = 0 Ptr_h(0,8)->Y = 0
Ptr_h(0,9)->X = 0 Ptr_h(0,9)->Y = 0
Ptr_h(0,10)->X = 0 Ptr_h(0,10)->Y = 0
Ptr_h(0,11)->X = 0 Ptr_h(0,11)->Y = 0
Ptr_h(0,12)->X = 0 Ptr_h(0,12)->Y = 0
Ptr_h(0,13)->X = 0 Ptr_h(0,13)->Y = 0
Ptr_h(0,14)->X = 0 Ptr_h(0,14)->Y = 0
Ptr_h(0,15)->X = 0 Ptr_h(0,15)->Y = 0
我能做些什么来解决这个问题?
【问题讨论】:
-
您应该对所有 CUDA API 调用和内核调用执行正确的cuda error checking。它将指出您遇到问题的代码行。由于您的内核 printf 语句没有显示出来,因此很明显您的内核没有正确执行。使用
cuda-memcheck运行您的代码可能会对此有所了解。 -
事实上,当我运行你的代码时,你的一些内核 printf 语句确实出现了。因此,如果您没有看到
Ptr = 0 Ptr = 1 ...,那么您可能还有另一个问题(机器配置)。但适当的 cuda 错误检查将帮助您发现是否也是这种情况。 -
感谢罗伯特的评论。但是 printf 在内核中运行良好,我可以看到 Ptr = 0 Ptr = 1 ...唯一的问题是该数据没有传递/复制到主机。我正在尝试关注您的下一个答案...