计算着色器：从另一个线程中读取写入的数据？答案

【问题标题】：Compute shader: read data written in one thread from another?计算着色器：从另一个线程中读取写入的数据？
【发布时间】：2025-12-15 13:30:01
【问题描述】：

谁能告诉我 DirectX 11 是否可以使用以下计算着色器？

我希望 Dispatch 中的第一个线程访问缓冲区 (g_positionsGrid) 中的元素，以使用临时值设置（比较交换）该元素，以表示它正在执行某些操作。

在这种情况下，临时值为 0xffffffff，第一个线程将继续并从结构化追加缓冲区 (g_positions) 分配一个值并将其分配给该元素。

到目前为止一切都很好，但是调度中的其他线程可能会介于比较交换和第一个线程的分配之间，因此需要等待分配索引可用。我这样做是忙着等待，即 while 循环。

但遗憾的是，这只是锁定了 GPU，因为我假设第一个线程写入的值不会传播到卡在 while 循环中的其他线程。

有没有办法让这些线程看到该值？

感谢您的帮助！

RWStructuredBuffer<float3> g_positions : register(u1);
RWBuffer<uint> g_positionsGrid : register(u2);

void AddPosition( uint address, float3 pos )
{
    uint token = 0; 

    // Assign a temp value to signify first thread has accessed this particular element
    InterlockedCompareExchange(g_positionsGrid[address], 0, 0xffffffff, token);

    if(token == 0)
    {
        //If first thread in here allocate index and assign value which
        //hopefully the other threads will pick up
        uint index = g_positions.IncrementCounter();
        g_positionsGrid[address] = index;
        g_positions[index].m_position = pos;
    }
    else
    {
        if(token == 0xffffffff)
        {
            uint index = g_positionsGrid[address];

            //This never meets its condition
            [allow_uav_condition]
            while(index == 0xffffffff) 
            { 
                //For some reason this thread never gets the assignment
                //from the first thread assigned above
                index = g_positionsGrid[address]; 
            }

            g_positions[index].m_position = pos;
        }
        else
        {
            //Just assign value as the first thread has already allocated a valid slot 
            g_positions[token].m_position = pos;

        }
    }
}

【问题讨论】：

标签： directx compute-shader

【解决方案1】：

DirectCompute 中的线程同步非常简单，但与 CPU 线程相比，同样的功能非常不灵活。 AFAIK，在计算着色器中的线程之间同步数据的唯一方法是使用groupshared 内存和GroupMemoryBarrierWithGroupSync()。这意味着，您可以：

在groupshared 内存中创建小的临时缓冲区
计算值
写入groupshared缓冲区
与GroupMemoryBarrierWithGroupSync()同步线程
从另一个线程中读取 groupshared 并以某种方式使用它

要实现所有这些东西，您需要适当的数组索引。但是你可以从哪里得到它？在 Dispatch 中传递的 DirectCompute 值和可以在着色器中获取的系统值（SV_GroupIndex、SV_DispatchThreadID、SV_GroupThreadID、SV_GroupID）related。使用这些值，您可以计算指数来评估您的缓冲区。

计算着色器没有很好的文档记录，也没有简单的方法，但至少你可以找到更多信息：

阅读MSDN: Compute shader overview
在第 9 频道观看 DirectCompute Lecture Series 视频
检查来自DirectX SDK 的计算着色器示例，非常好来自NVIDIA`s SDK（10 和 11）的样本
阅读 this 高级 NVIDIA 论文，他们在其中实现了线程减少，然后优化代码以将运行速度提高 10 倍；)

从您的代码开始。好吧，也许你可以重新设计一下。

所有线程执行相同的任务总是好的。对称加载。实际上，您不能像在 CPU 代码中那样为线程分配不同的任务。
如果您的数据首先需要一些预处理和进一步处理，您可能希望将其划分为不同的 Dispatch() 调用（不同的着色器），您将按顺序调用它们：
- preprocessShader 从缓冲区 inputData 读取并写入 preprocessedData
- calculateShader 来自preprocessedData 并写入finalData
在这种情况下，您可以丢弃任何慢线程同步和慢组共享内存。
看看上面提到的“线程减少”技巧。

希望对您有所帮助！祝您编码愉快！

【讨论】：