金属内核着色器不工作答案

【问题标题】：Metal kernel shader not working金属内核着色器不工作
【发布时间】：2017-12-20 03:16:09
【问题描述】：

我对为什么我的内核着色器不工作感到困惑。

我有真正的原始 RGBA32 像素缓冲区 (inBuffer)，我将其发送到内核着色器。我还有一个接收 MTLTexture，我在其 RGBA8Norm 描述符中将其用法设置为 MTLTextureUsageRenderTarget。

然后我就这样调度编码...

id<MTLLibrary> library = [_device newDefaultLibrary];
id<MTLFunction> kernelFunction = [library newFunctionWithName:@"stripe_Kernel"];
id<MTLComputePipelineState> pipeline = [_device newComputePipelineStateWithFunction:kernelFunction error:&error];
id<MTLCommandQueue> commandQueue = [_device newCommandQueue];
MTLTextureDescriptor *textureDescription = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA8Unorm
                                                                                              width:outputSize.width
                                                                                             height:outputSize.height
                                                                                          mipmapped:NO];
[textureDescription setUsage:MTLTextureUsageRenderTarget];
id<MTLTexture> metalTexture = [_device newTextureWithDescriptor:textureDescription];

MTLSize threadgroupCounts = MTLSizeMake(8, 8, 1);
MTLSize threadgroups = MTLSizeMake([metalTexture width] / threadgroupCounts.width,
                                   [metalTexture height] / threadgroupCounts.height, 1);

...

id<MTLBuffer> metalBuffer = [_device newBufferWithBytesNoCopy:inBuffer
                                                       length:inputByteCount
                                                       options:MTLResourceStorageModeShared
                                                      deallocator:nil];

    [commandEncoder setComputePipelineState:pipeline];
    [commandEncoder setTexture:metalTexture atIndex:0];
    [commandEncoder setBuffer:metalBuffer offset:0 atIndex:0];
    [commandEncoder setBytes:&imageW length:sizeof(ushort) atIndex:1];
    [commandEncoder setBytes:&imageH length:sizeof(ushort) atIndex:2];

    [commandEncoder dispatchThreadgroups:threadgroups threadsPerThreadgroup:threadgroupCounts];
    [commandEncoder endEncoding];

    [commandBuffer commit];
    [commandBuffer waitUntilCompleted];

目的是获取一个 mxn 大小的原始图像，并将其打包成一个纹理，例如 2048x896。这是我的内核着色器：

kernel void stripe_Kernel(texture2d<float, access::write> outTexture [[ texture(0) ]],
                      device const float *inBuffer [[ buffer(0) ]],
                      device const ushort * imageWidth [[ buffer(1) ]],
                      device const ushort * imageHeight [[ buffer(2) ]],
                      uint2 gid [[ thread_position_in_grid ]])
{
    const ushort imageW = *imageWidth;
    const ushort imageH = *imageHeight;

    const uint32_t textureW = outTexture.get_width();  // eg. 2048

    uint32_t posX = gid.x;  // eg. 0...2047
    uint32_t posY = gid.y;  // eg. 0...895

    uint32_t sourceX = ((int)(posY/imageH)*textureW + posX) % imageW;
    uint32_t sourceY = (int)(posY% imageH);

    const uint32_t ptr = (sourceX + sourceY* imageW);
    float pixel = inBuffer[ptr];

    outTexture.write(pixel, gid);
}

我稍后抓取该纹理缓冲区并将其转换为 CVPixelBuffer：

MTLRegion region = MTLRegionMake2D(0, 0, (int)outputSize.width, (int)outputSize.height);
// lock buffers, copy texture over
CVPixelBufferLockBaseAddress(outBuffer, 0);
void *pixelData = CVPixelBufferGetBaseAddress(outBuffer);
[metalTexture getBytes:CVPixelBufferGetBaseAddress(outBuffer)
           bytesPerRow:CVPixelBufferGetBytesPerRow(outBuffer)
            fromRegion:region
           mipmapLevel:0];
CVPixelBufferUnlockBaseAddress(outBuffer, 0);

我的问题是，我的 CVPixelBuffer 总是空的（已分配但为零）。在配备 Radeon M395 GPU 的 iMac 17,1 上运行。

我什至已经将不透明的红色像素撞到内核着色器的输出纹理中。不过，我什至看不到红色。

更新：我对这个问题的解决方案是完全放弃使用 MTLTextures（我什至尝试过使用 MTLBlitCommandEncoder 同步纹理）——没有骰子。

我最终将 MTLBuffers 用于输入“纹理”和输出“纹理”，并在内核着色器中重新计算数学。我的输出缓冲区现在是一个预先分配的、锁定的 CVPixelBuffer，这正是我最终想要的。

【问题讨论】：

Metal kernels not behaving properly on the new MacBook Pro (late 2016) GPUs的可能重复
感谢肯提供信息。添加强制纹理同步没有帮助。我现在已经放弃了纹理，并让它与一组包含 RGB32A 像素数据的输入/输出 MTLBuffers 一起工作。现在，如果我的源缓冲区是 24 位的，我会遇到问题...stackoverflow.com/questions/45130709/…

标签： metal

【解决方案1】：

首先，使用 MTLTextureUsage.renderTarget 我收到错误“validateComputeFunctionArguments:825: failed assertion `Function writes texture (outTexture[0]) which uses (0x04) doesn't specify MTLTextureUsageShaderWrite (0x02)'”所以它应该是MTLTextureUsage.shaderWrite.

由于某种原因，如果我强制使用 gfxSwitch 的 Intel GPU，则从纹理回读会返回正确的数据，但对于 Radeon，无论“textureDesc.resourceOptions = MTLResourceOptions.storageModeXXX”标志如何，它始终为零。

对英特尔和 Radeon 460 都有效的方法是创建一个 MTLBuffer 并使用它而不是纹理。但是，您必须计算索引。如果您不使用 mip 映射或使用浮点索引进行采样，那么切换到缓冲区应该没什么大不了的，对吧？

let texBuffer = device?.makeBuffer(length:4 * width * height, options: MTLResourceOptions.storageModeShared)

var 结果 = [Float](重复：0，计数：宽 * 高 * 4) 让数据 = NSData(bytesNoCopy: texBuffer!.contents(), 长度: 4 * 宽度 * 高度, freeWhenDone: false) data.getBytes(&result, 长度: 4 * 宽度 * 高度)

我假设创建一个由 MTLBuffer 支持的纹理是可行的，但 api 仅在 OSX 10.13 中。

编辑：正如 Ken Thomases 所指出的，Metal kernels not behaving properly on the new MacBook Pro (late 2016) GPUs 也有类似的讨论

我使用该线程第一篇文章中的方法和着色器制作了一个示例应用程序，链接线程的修复对我有用。这是应用程序代码的链接，以防有人想要一个可重现的示例。 https://gist.github.com/astarasikov/9e4f58e540a6ff066806d37eb5b2af29

【讨论】：

根据您的建议，Alexst，我成功地使用了两个缓冲区！话虽如此，在实现我的 Metal 内核之前，我使用 Accelerate (CPU) 将 24 位数据转换为 32 位。所以现在我考虑使用内核一次性完成 24 位 --> 32 位转换和打包。显然，这令人费解有其自身的问题：stackoverflow.com/questions/45130709/…
能不能上传最小的app来重现？我尝试设置一个可写纹理，它适用于从“潜在重复”线程链接的修复程序。我也可以上传我的示例（在 Swift 中）