CUDA 内存限制答案

【问题标题】：CUDA memory limitationsCUDA 内存限制
【发布时间】：2012-01-04 22:41:11
【问题描述】：

如果我尝试向我的 CUDA 设备发送比可用内存大小更重的结构，CUDA 会给我任何警告或错误吗？

我之所以这么问，是因为我的 GPU 有 1024 MBytes（1073414144 字节）的全局内存总量，但我不知道应该如何处理以及最终的问题。

这是我的代码：

#define VECSIZE 2250000
#define WIDTH 1500
#define HEIGHT 1500



// Matrices are stored in row-major order:
// M(row, col) = *(M.elements + row * M.width + col)
struct Matrix
{

    int width;
    int height;
    int* elements;

};


   int main()
   {


Matrix M;
M.width = WIDTH;
M.height = HEIGHT;
M.elements = (int *) calloc(VECSIZE,sizeof(int));   

int row, col;   


// define Matrix M
// Matrix generator:
for (int i = 0; i < M.height; i++)
    for(int j = 0; j < M.width; j++)
    {
    row = i;
    col = j; 

    if (i == j)
    M.elements[row * M.width + col] = INFINITY;
        else
        {
        M.elements[row * M.width + col] = (rand() % 2); // because 'rand() % 1' just does not seems to work ta all.
        if (M.elements[row * M.width + col] == 0)  // can't have zero weight.
            M.elements[row * M.width + col] = INFINITY;
            else if (M.elements[row * M.width + col] == 2)
                M.elements[row * M.width + col] = 1;    

        }

    }





// Declare & send device Matrix to Device.
Matrix d_M;
d_M.width = M.width;
d_M.height = M.height;
size_t size = M.width * M.height * sizeof(int);
cudaMalloc(&d_M.elements, size);
cudaMemcpy(d_M.elements, M.elements, size, cudaMemcpyHostToDevice);

int *d_k=  (int*) malloc(sizeof(int));
cudaMalloc((void**) &d_k, sizeof (int));



int *d_width=(int*)malloc(sizeof(int));
cudaMalloc((void**) &d_width, sizeof(int));
unsigned int *width=(unsigned int*)malloc(sizeof(unsigned int));
width[0] = M.width;
cudaMemcpy(d_width, width, sizeof(int), cudaMemcpyHostToDevice);

int *d_height=(int*)malloc(sizeof(int));
cudaMalloc((void**) &d_height, sizeof(int));
unsigned int *height=(unsigned int*)malloc(sizeof(unsigned int));
height[0] = M.height;   
cudaMemcpy(d_height, height, sizeof(int), cudaMemcpyHostToDevice);
    /*

        et cetera .. */

【问题讨论】：

200 万个元素 * 4 个字节只有 8 MB。您有 1 GB，即 1024 MB 可使用！
是的，我不在这个给定的例子中，但我的意愿是随着时间的推移使用更大的矩阵。

标签： memory memory-management cuda

【解决方案1】：

虽然您目前可能没有向 GPU 发送足够的数据以最大限度地利用其内存，但当您这样做时，您的 cudaMalloc 将返回错误代码 cudaErrorMemoryAllocation，根据 cuda api docs，表示内存分配失败的。我注意到在您的示例代码中，您没有检查 cuda 调用的返回值。需要检查这些返回码以确保您的程序正常运行。 cuda api 不会抛出异常：您必须检查返回码。有关检查错误和获取有关错误的有意义消息的信息，请参阅 this article

【讨论】：

【解决方案2】：

如果您使用cutil.h，那么它提供了两个非常有用的宏：
CUDA_SAFE_CALL（在发出 cudaMalloc、cudaMemcpy 等函数时使用）
和
CUT_CHECK_ERROR（在执行内核后用于检查内核执行中的错误）。
他们通过使用flipchart 提供的文章中详述的错误检查机制来处理错误（如果有）。

【讨论】：