CUDA程序的输出为空[重复]答案

【问题标题】：Output of CUDA program is empty [duplicate]CUDA程序的输出为空[重复]
【发布时间】：2011-07-03 14:08:52
【问题描述】：

可能重复：
Output of cuda program is not what was expected

我正在运行这个简单的 CUDA 程序：

#include <cuda_runtime.h>
#include <cuda.h>
#include <stdio.h>

__global__ void
  display(char *t[])
{

  int v = blockIdx.x;
  int p = blockIdx.y;
  int offset = v+ p*gridDim.x;
  t[offset] = "(";
  //
}

void 
main()
{
  int c = 5;
  cudaGetDeviceCount(&c);
  cudaDeviceProp prop;
  cudaGetDeviceProperties(&prop,0);
  printf("The device name is : %s\n", prop.name);
  //bool value = prop.integrated;
  char *x[6];
  int i;
  for (i = 0; i<6; i++)
      cudaMalloc((void**)&x[i], 20*sizeof(char));

  // Checking the meaning of grid(3,2)
  dim3 grid(3,2); 
  display<<<grid,1>>>(x);
  char y[30];
  cudaMemcpy(y, x[0], 20*sizeof(char), cudaMemcpyDeviceToHost);
  printf("The values is :%s\n", y);
  cudaFree(x[0]);

  getchar();
}

我不明白为什么数组 y 在执行结束时仍然为空。不应该是“（”吗？

【问题讨论】：

仍然面临问题...请帮助！
我们正在努力提供帮助！伙计，冷静下来，给我们一些时间

标签： c++ cuda nvidia

【解决方案1】：

我已经回答了这个问题here。

但我会将这个建议留给首先解决这个问题的其他人：

在调试 CUDA 代码时，我强烈建议添加强制同步并检查错误，正如我在 your other post 中提到的那样，以确保您的硬件设置、API 设置、当前天气不会搞砸：

/*  Force Thread Synchronization  */
cudaError err = cudaThreadSynchronize();

/*  Check for and display Error  */
if ( cudaSuccess != err )
{
    fprintf( stderr, "Cuda error in file '%s' in line %i : %s.\n",
             __FILE__, __LINE__, cudaGetErrorString( err) );
}

OP 代码的关键问题是 x 存在于 CPU 上，尽管它的成员存在于 GPU 上。再次，请参阅我的回答 here。

【讨论】：

我尝试了上面的方法，但它只是在这些点显示未知错误。您可以尝试在您的机器上运行它并告诉我：
deviceQUery 工作正常。注意：如果在 cuda memcpy 中我将要复制的字节数设置为 0，则不会出现错误。所以那里有问题