如何通过 CoreML 解释 MLMultiArray 在语义分割中的结果？答案

【问题标题】：如何通过 CoreML 解释 MLMultiArray 在语义分割中的结果？
【发布时间】：2022-01-23 16:35:15
【问题描述】：

我正在尝试在我的应用程序中实现语义分割模型。我已经能够将 u2net 模型转换为 CoreML 模型。我无法从 MLMultiArray 输出中获得可行的结果。规格说明如下：

input {
  name: "input"
  type {
    imageType {
      width: 512
      height: 512
      colorSpace: RGB
    }
  }
}
output {
  name: "x_1"
  type {
    multiArrayType {
      shape: 1
      shape: 3
      shape: 512
      shape: 512
      dataType: FLOAT32
    }
  }
}

该模型在打开它并在 Xcode 中使用模型预览功能时效果很好。它以 2 种颜色显示 2 个不同的标签（只有 2 个类 + 1 个背景）。我希望在我的应用程序中具有相同的输出，但是当我手动将 MLMultiArray 输出处理为 CGImage 时，我会得到不同的结果。我正在使用here 提供的代码，如下所示：

let image = output.cgImage(min: -1, max: 1, channel: 0, axes: (1,2,3))

这给了我一些看起来有点用的东西，但它在每个通道中都有很多渐变。我需要的是每个标签只有 1 个颜色值的图像。

我尝试通过this sample code将模型的输出直接转换为图像。这只是在 Xcode 模型预览中显示“推理失败”。当我尝试删除 MultiArray 输出中不必要的额外维度时，我收到此错误：

"Error reading protobuf spec. validator error: Layer 'x_1' of type 'Convolution' has output rank 3 but expects rank at least 4."

Xcode 中的模型预览做了哪些我没有做的事情？我需要采取后处理步骤来获得可用的输出吗？

【问题讨论】：

标签： xcode coreml semantic-segmentation

【解决方案1】：

回答我自己的问题：

结果表明每个通道的结果像素代表它是该通道所代表的类的可能性。

换句话说就是找到某个位置的最大像素值。具有最高值的通道是类像素。

func getLabelsForImage() { 
    ....
    setup model here
    ....
    guard let output = try? model.prediction(input: input) else {
        fatalError("Could not generate model output.")
    }
    let channelCount = 10

    // Ugly, I know. But works:
    let colors = [NSColor.red.usingColorSpace(.sRGB)!, NSColor.blue.usingColorSpace(.sRGB)!, NSColor.green.usingColorSpace(.sRGB)!, NSColor.gray.usingColorSpace(.sRGB)!, NSColor.yellow.usingColorSpace(.sRGB)!, NSColor.purple.usingColorSpace(.sRGB)!, NSColor.cyan.usingColorSpace(.sRGB)!, NSColor.orange.usingColorSpace(.sRGB)!, NSColor.brown.usingColorSpace(.sRGB)!, NSColor.magenta.usingColorSpace(.sRGB)!]
    
    // I don't know my min and max output, -64 and 64 seems to work OK for my data.
    var firstData = output.toRawBytes(min: Float32(-64), max: Float32(64), channel: 0, axes: (0,1,2))!.bytes
    var outputImageData:[UInt8] = []
    for _ in 0..<firstData.count {
        let r:UInt8 = UInt8(colors[0].redComponent * 255)
        let g:UInt8 = UInt8(colors[0].greenComponent * 255)
        let b:UInt8 = UInt8(colors[0].blueComponent * 255)
        let a:UInt8 = UInt8(colors[0].alphaComponent * 255)
        
        outputImageData.append(r)
        outputImageData.append(g)
        outputImageData.append(b)
        outputImageData.append(a)
    }
    
    for i in 1..<channelCount {
        let data = output.toRawBytes(min: Float32(-64), max: Float32(64), channel: i, axes: (0,1,2))!.bytes
        for j in 0..<data.count {
            if data[j] > firstData[j] {
                firstData[j] = data[j]
                let r:UInt8 = UInt8(colors[i].redComponent * 255)
                let g:UInt8 = UInt8(colors[i].greenComponent * 255)
                let b:UInt8 = UInt8(colors[i].blueComponent * 255)
                let a:UInt8 = UInt8(colors[i].alphaComponent * 255)
                
                outputImageData[j*4] = r
                outputImageData[j*4+1] = g
                outputImageData[j*4+2] = b
                outputImageData[j*4+3] = a
            }
        }
    }
    
    let image = imageFromPixels(pixels: outputImageData, width: 512, height: 512)
    image.writeJPG(toURL: labelURL.deletingLastPathComponent().appendingPathComponent("labels.jpg"))
}

// I found this function here: https://stackoverflow.com/questions/38590323/obtain-nsimage-from-pixel-array-problems-swift
func imageFromPixels(pixels: UnsafePointer<UInt8>, width: Int, height: Int)-> NSImage { //No need to pass another CGImage
    let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
    let bitmapInfo:CGBitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedLast.rawValue)
    let bitsPerComponent = 8 //number of bits in UInt8
    let bitsPerPixel = 4 * bitsPerComponent //ARGB uses 4 components
    let bytesPerRow = bitsPerPixel * width / 8 // bitsPerRow / 8 (in some cases, you need some paddings)
    let providerRef = CGDataProvider(
        data: NSData(bytes: pixels, length: height * bytesPerRow) //Do not put `&` as pixels is already an `UnsafePointer`
    )

    let cgim = CGImage(
        width: width,
        height: height,
        bitsPerComponent: bitsPerComponent,
        bitsPerPixel: bitsPerPixel,
        bytesPerRow: bytesPerRow, //->not bits
        space: rgbColorSpace,
        bitmapInfo: bitmapInfo,
        provider: providerRef!,
        decode: nil,
        shouldInterpolate: true,
        intent: CGColorRenderingIntent.defaultIntent
    )
    return NSImage(cgImage: cgim!, size: NSSize(width: width, height: height))
}

【讨论】：