Cuda 原子操作答案

【问题标题】：Cuda atomic operationsCuda 原子操作
【发布时间】：2016-04-24 23:31:34
【问题描述】：

我有以下内核：

__global__
void collect_boundary(const int64_t* cvert, const csr_node* neighb, const bool* affected, int64_t* parent, const uint64_t* dist, uint64_t* ndist, bool* mask, int64_t numvertices){
    int64_t tid = blockIdx.x*blockDim.x + threadIdx.x;
    if(tid >= numvertices || affected[tid] || dist[tid]==MY_INFINITY)
        return;
    for(int64_t index = cvert[tid]; index<cvert[tid+1]; index++){
        auto vtex = neighb[index];
        if(affected[vtex.head]){
            int64_t ndistent = dist[tid] + vtex.weight; 
            atomicMin((unsigned long long int*)(ndist + vtex.head),(unsigned long long int)ndistent);
            /*if(ndist[vtex.head] == ndistent){
                parent[vtex.head] = tid;
            }*/
        }
    }
}

基本上我希望每个线程都按照给定的方式计算 ndistent，并且我将 ndist[vtex.head] 更新为所有 ndistent 中的最小值。

我使用以下方法实现了这一点：

atomicMin((unsigned long long int*)(ndist + vtex.head),(unsigned long long int)ndistent);

//That is each thread will update ndist[vtex.head] if and only if
//it's own value of ndistent is less than the ndist[vtex.head] 
//which was initialized to INFINITY before the kernel launch

但现在我想存储给出最小 ndistent 的 tid。

我试过这样的

if(ndist[vtex.head] == ndistent){  // prob_condition 1
    parent[vtex.head] = tid;       // prob_statment 1
}

//That is each thread will check wether the value in 
//ndist[vtex.head] is equal to it's own ndistent 
// and then store the tid if it is.

上面的 sn-p 将不起作用，因为某些 thread X 可能会发现 prob_condition 1 为 true ，但在它执行 prob_statement 1 之前，让我们说出将给出最小值的线程 say 线程 Y 执行 prob_statement 1 并存储它的 tid。现在线程 X 将恢复并存储它的 tid，因此 min tid 丢失了。

所以我希望 prob_condition 1 和 prob_statement 1 以原子方式执行。

或者，我需要以原子方式执行以下 3 个操作：

检查ndistent是否
更新 ndist[vtex.head]
将 tid 存储在 parent[vtex.head]

有人有什么建议吗？

编辑： 请注意，我必须以可变数量的块和可变数量的线程运行此内核。

【问题讨论】：

标签： c++ concurrency cuda atomic

【解决方案1】：

它可能无法按照您想要的方式解决您的并发问题，但您可以采用两个阶段的方法：首先，计算最小值，然后找到具有该最小值的人。

此外，如果多个 tid 具有相同的 ndistent 值，则输出可能会因执行而异，实际上正如 Taro 指出的那样，warp 的执行顺序不遵守可预测的规则。这种两阶段方法可以帮助您为最小值列表构建可预测的模式。

在更 hacky 的方法中，如果 ndistent 值和 tid 都可以放入 64 位，您可以尝试将 64 位值的高位与 ndistent 和低位保存tid，并在一条指令中执行 atomicMin。

【讨论】：