金属内核着色器——淡入淡出实现答案

【问题标题】：Metal kernel shader -- fade implementation金属内核着色器——淡入淡出实现
【发布时间】：2017-07-25 13:05:39
【问题描述】：

我还没有写过很多金属内核着色器；这是两个 RGBX-32 图像之间的新“淡入淡出”着色器，在 inBuffer1 (0.0) 到 inBuffer2 (1.0) 之间使用 0.0 到 1.0 的补间值。

这里有什么我遗漏的吗？我觉得这可能非常低效。

我的第一个想法是尝试使用矢量数据类型（例如char4）进行减法和乘法运算，认为这可能会更好，但其结果肯定是不确定的（因为某些组件将是负数）。

另外，使用 MTLTexture 和 MTLBuffer 对象有什么优势吗？

kernel void fade_Kernel(device const uchar4  *inBuffer1  [[ buffer(0) ]],
                        device const uchar4  *inBuffer2  [[ buffer(1) ]],
                        device const float   *tween      [[ buffer(2) ]],
                        device uchar4        *outBuffer  [[ buffer(3) ]],
                        uint gid [[ thread_position_in_grid ]])
{
    const float t = tween[0];
    uchar4 pixel1 = inBuffer1[gid];
    uchar4 pixel2 = inBuffer2[gid];

    // these values will be negative
    short r=(pixel2.r-pixel1.r)*t;  
    short g=(pixel2.g-pixel1.g)*t;
    short b=(pixel2.b-pixel1.b)*t;

    outBuffer[gid]=uchar4(pixel1.r+r,pixel1.g+g,pixel1.b+b,0xff);
}

【问题讨论】：

标签： metal pixel-shader

【解决方案1】：

首先，您应该将tween 参数声明为：

constant float &tween [[ buffer(2) ]],

使用constant 地址空间更适合这样的值，该值对于函数的所有调用都相同（并且不按网格位置等索引）。此外，将其设置为引用而不是指针会告诉编译器您不会索引“数组”中可能是指针的其他元素。

最后，有一个mix() 函数可以执行您在此处执行的那种计算。因此，您可以将函数的主体替换为：

uchar4 pixel1 = inBuffer1[gid];
uchar4 pixel2 = inBuffer2[gid];

outBuffer[gid] = uchar4(uchar3(mix(float3(pixel1.rgb), float3(pixel2.rgb), tween)), 0xff);

至于使用纹理是否更好，这在一定程度上取决于您计划在运行此内核后如何处理结果。如果你无论如何都要用它来做类似纹理的事情，那么在整个过程中使用纹理可能会更好。实际上，使用混合绘图操作而不是计算内核可能会更好。毕竟，这种混合是 GPU 必须一直做的事情，所以这条路可能很快。您必须测试每种方法的性能。

【讨论】：

谢谢你，肯。你很有帮助，再一次。奇怪的是，在 Metal2 之前的实现中，“混合”似乎不是的一部分。查看 Metal Shader Language 文档，我可以做“饱和”但不能做“混合”-> 没有匹配函数调用“混合”。
我的错误。 mix() 仅适用于浮点类型。我已经编辑了我的答案来来回转换。转换隐含在您的原始代码中。您可能还想将对mix() 的调用包装在对round() 的调用中，尽管您的原始代码就像我的新代码一样被截断了。
谢谢肯。如果要考虑 alpha，我想 outBuffer[gid]=uchar4(mix(float4(pixel1),float4(pixel2),tween)) 也可以正常工作。

【解决方案2】：

如果您正在处理图像，使用 MTLTexture 比使用 MTLBuffer 更有效。使用“half”也比使用“uchar”更好。我从今年 WWDC 的一位 Apple 工程师那里直接了解到这一点。

kernel void alpha(texture2d<half, access::read>  inTexture2  [[texture(0)]],
    texture2d<half, access::read>  inTexture1  [[texture(1)]],
    texture2d<half, access::write> outTexture [[texture(2)]],
    const device float& tween [[ buffer(3) ]],
    uint2 gid [[thread_position_in_grid]]) 
{
    // Check if the pixel is within the bounds of the output texture
    if((gid.x >= outTexture.get_width()) || (gid.y >= outTexture.get_height())) {
        // Return early if the pixel is out of bounds
        return;
    }
    half4 color1  = inTexture1.read(gid);
    half4 color2  = inTexture2.read(gid);
    outTexture.write(half4(mix(color1.rgb, color2.rgb, half(tween)), color1.a), gid);
}

【讨论】：

我想“half”或“char4”的有效使用取决于源缓冲区是RGBA-32（4字节/像素）还是其他格式，不是吗？