将 CUDA 纹理绑定到浮动图像答案

【问题标题】：Bind CUDA Texture to a float Image将 CUDA 纹理绑定到浮动图像
【发布时间】：2012-05-13 03:34:45
【问题描述】：

我有一个 1 通道，在 C 端浮动图像，如下所示：

int width, height;
float* img;

我想将此图像传递给 CUDA 纹理。我正在阅读NVIDIA CUDA C Programming Guide（第 42-43 页）并使用教程，编写了如下代码：

main.cpp：

int main()
{
     int width, height;
     float* h_Input;
     ReadImage(&h_Input, &width, &height); // My function which reads the image.
     WriteImage(h_Input, width, height); // works perfectly...

     float* h_Output = (float*) malloc(sizeof(float) * width * height);

     CalculateWithCuda(h_Input, h_Output, width,height);
     WriteImage(h_Output, width, height); // writes an empty-gray colored image.... *WHY???* 
}

kernel.cu：

texture<float, cudaTextureType2D, cudaReadModeElementType> texRef; // 2D float texture

__global__ void Kernel(float* output, int width, int height)
{
    int i = blockIdx.y * blockDim.y + threadIdx.y; // row number 
    int j = blockIdx.x * blockDim.x + threadIdx.x; // col number

    if(i < height && j < width)
    {
           float temp = tex2D(texRef, i + 0.5f, j + 0.5f);
           output[i * width + j] = temp ;
    }
} 

void CalculateWithCuda(const float* h_input, float* h_output, int width, int height)
{
    float* d_output;

    // Allocate CUDA array in device memory
    cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0,cudaChannelFormatKindFloat);
    cudaArray* cuArray;
    cudaMallocArray(&cuArray, &channelDesc, width, height);
    // Copy to device memory some data located at address h_data in host memory 
    cudaMemcpyToArray(cuArray, 0, 0, h_input, width * height * sizeof(float) , cudaMemcpyHostToDevice);
    // Set texture parameters
    texRef.addressMode[0] = cudaAddressModeWrap;
    texRef.addressMode[1] = cudaAddressModeWrap;
    texRef.filterMode     = cudaFilterModeLinear;
    texRef.normalized     = true;

    // Bind the array to the texture reference
    cudaBindTextureToArray(texRef, cuArray, channelDesc);

    // Allocate GPU buffers for the output image ..
    cudaMalloc(&d_output, sizeof(float) * width * height);

    dim3 threadsPerBlock(16,16);
    dim3 numBlocks((width/threadsPerBlock.x) + 1, (height/threadsPerBlock.y) + 1);

    Kernel<<<numBlocks, threadsPerBlock>>>(d_output, width,height);

    cudaDeviceSynchronize();

    // Copy output vector from GPU buffer to host memory.
    cudaMemcpy(h_output, d_output, sizeof(float) * width * height, cudaMemcpyDeviceToHost);

    // Free GPU  memory ...
}

正如我在代码中所说的；这个内核必须从纹理中读取并给我与输出相同的图像。但是，我正在为输出拍摄一张空的（灰色）图像。我刚刚在教程中实现了相同的方式，为什么这个纹理不起作用？

如果有人告诉我解决此问题的方法，我将不胜感激...

PS：当然，这不是代码的全部。我只是复制了必要的部分。如果您需要其他详细信息，我也会支持。

提前致谢。

【问题讨论】：

为了清楚起见，编程指南在其示例中不包括对调用返回值的任何检查。我不确定我是否同意这个决定，因为检查返回值确实应该在每个 CUDA 程序员中根深蒂固。请查看 SDK 示例，了解如何检查每个 CUDA 调用（包括内核调用）的返回值，如果这有助于您解决问题，请告诉我们。
@RogerDahl：实际上，我已经这样做了……你的意思是使用cudaStatus，对吗？我没有任何错误消息。一切似乎都很好。但是，代码不会产生我正在等待的图像...
问题出在texRef.normalized = true; 行。我删除了它，它工作正常，不知道为什么......

标签： cuda textures

【解决方案1】：

当您使用归一化坐标时，通过从 0 到 1（不包括）的坐标访问纹理。您忘记将基于 threadIdx 的整数坐标转换为标准化坐标。

unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y * blockDim.y + threadIdx.y; 
float u = x / (float)width;  
float v = y / (float)height;

【讨论】：

感谢您的回答。我会在此修复语法，但 SO 不会让我这样做。（提示，提示：））