CUDA：3D 网格中的全局唯一线程索引答案

【问题标题】：CUDA : Global unique thread index in a 3D GridCUDA：3D 网格中的全局唯一线程索引
【发布时间】：2025-12-31 17:00:02
【问题描述】：

正如问题所述，如果我有一个 3D 网格块，那么获取一个线程的全局唯一索引的公式是什么？

让方块本身保持一维。

【问题讨论】：

Cuda, executional thread order in a 3d-block的可能重复
请在提出新问题前进行搜索。仅在 2 天前提出并回答了一个相同的问题。
嘿！我浏览了那个线程，它在那里说 threadId 是一个特定块中线程的 id。这不是问题在帖子中提出的问题。在这里，如果我启动带有 1D 块的 3D 网格，我想知道全局唯一线程 ID。
可能重复的答案状态这里 threadID 是块内的线程号答案中的代码甚至不使用单个 blockIdx 语句，因此它在线程块内而不是全局的。

标签： cuda

【解决方案1】：

// unique block index inside a 3D block grid
const unsigned long long int blockId = blockIdx.x //1D
        + blockIdx.y * gridDim.x //2D
        + gridDim.x * gridDim.y * blockIdx.z; //3D

// global unique thread index, block dimension uses only x-coordinate
const unsigned long long int threadId = blockId * blockDim.x + threadIdx.x;

【讨论】：

你为什么需要unsigned long long int？只有大约 256,000 个线程，所以 int 或 unsigned int 应该没问题，对吧？
(Maximum x-, y-, or z-dimension of a grid of thread blocks power Maximum dimensionality of grid of thread blocks) * Maximum number of threads per block 为您提供最大线程总数。对于 Cuda 2.x，这给出 65535³ * 1024
const std::size_t 就足够了；在你的内核中尝试static_assert(sizeof(unsigned long long int)==sizeof(std::size_t),"");。

【解决方案2】：

聚会有点晚了，但我通常以一种非常通用的方式处理这个问题，因为它支持任意数量和大小的块（甚至是 2D）：

// Compute the offset in each dimension
const size_t offsetX = blockDim.x * blockIdx.x + threadIdx.x;
const size_t offsetY = blockDim.y * blockIdx.y + threadIdx.y;
const size_t offsetZ = blockDim.z * blockIdx.z + threadIdx.z;

// Make sure that you are not actually outs
if (offsetX >= sizeX || offsetY >= sizeY || offsetZ >= sizeZ)
  return;

// Compute the linear index assuming that X,Y then Z memory ordering
const size_t idx = offsetZ * sizeX * sizeY + offsetY * sizeX + offsetX;

请注意，我不是 CUDA 忍者。

【讨论】：

您尚未定义sizeX、sizeY 或sizeZ。

【解决方案3】：

@djmj 的现有答案很好，但稍微重新格式化可以更清楚地了解发生了什么（至少对我的大脑来说——这对 CUDA 来说是新的）：

long blockId = blockIdx.z  *  gridDim.x*gridDim.y
             + blockIdx.y  *  gridDim.x
             + blockIdx.x;
long threadsPerBlock = blockDim.x;
long i = blockId * threadsPerBlock + threadIdx.x;

blockId 是 complete z-dimension“切片”（二维网格）中的块的总和，加上 complete 行中的块最后（不完整）切片，加上该（不完整）切片最后（不完整）行中的块。

“完成”是指当前 (x, y, z) 块“之前”的块（关于我们将它们相加以确定整体块 id 的方式）。

【讨论】：