【问题标题】:如何通过 CoreML 解释 MLMultiArray 在语义分割中的结果?
【发布时间】:2022-01-23 16:35:15
【问题描述】:

我正在尝试在我的应用程序中实现语义分割模型。我已经能够将 u2net 模型转换为 CoreML 模型。我无法从 MLMultiArray 输出中获得可行的结果。规格说明如下:

input {
  name: "input"
  type {
    imageType {
      width: 512
      height: 512
      colorSpace: RGB
    }
  }
}
output {
  name: "x_1"
  type {
    multiArrayType {
      shape: 1
      shape: 3
      shape: 512
      shape: 512
      dataType: FLOAT32
    }
  }
}

该模型在打开它并在 Xcode 中使用模型预览功能时效果很好。它以 2 种颜色显示 2 个不同的标签(只有 2 个类 + 1 个背景)。我希望在我的应用程序中具有相同的输出,但是当我手动将 MLMultiArray 输出处理为 CGImage 时,我会得到不同的结果。我正在使用here 提供的代码,如下所示:

let image = output.cgImage(min: -1, max: 1, channel: 0, axes: (1,2,3))

这给了我一些看起来有点用的东西,但它在每个通道中都有很多渐变。我需要的是每个标签只有 1 个颜色值的图像。

我尝试通过this sample code将模型的输出直接转换为图像。这只是在 Xcode 模型预览中显示“推理失败”。当我尝试删除 MultiArray 输出中不必要的额外维度时,我收到此错误:

"Error reading protobuf spec. validator error: Layer 'x_1' of type 'Convolution' has output rank 3 but expects rank at least 4."

Xcode 中的模型预览做了哪些我没有做的事情?我需要采取后处理步骤来获得可用的输出吗?

【问题讨论】:

    标签: xcode coreml semantic-segmentation


    【解决方案1】:

    回答我自己的问题:

    结果表明每个通道的结果像素代表它是该通道所代表的类的可能性。

    换句话说就是找到某个位置的最大像素值。具有最高值的通道是类像素。

    func getLabelsForImage() { 
        ....
        setup model here
        ....
        guard let output = try? model.prediction(input: input) else {
            fatalError("Could not generate model output.")
        }
        let channelCount = 10
    
        // Ugly, I know. But works:
        let colors = [NSColor.red.usingColorSpace(.sRGB)!, NSColor.blue.usingColorSpace(.sRGB)!, NSColor.green.usingColorSpace(.sRGB)!, NSColor.gray.usingColorSpace(.sRGB)!, NSColor.yellow.usingColorSpace(.sRGB)!, NSColor.purple.usingColorSpace(.sRGB)!, NSColor.cyan.usingColorSpace(.sRGB)!, NSColor.orange.usingColorSpace(.sRGB)!, NSColor.brown.usingColorSpace(.sRGB)!, NSColor.magenta.usingColorSpace(.sRGB)!]
        
        // I don't know my min and max output, -64 and 64 seems to work OK for my data.
        var firstData = output.toRawBytes(min: Float32(-64), max: Float32(64), channel: 0, axes: (0,1,2))!.bytes
        var outputImageData:[UInt8] = []
        for _ in 0..<firstData.count {
            let r:UInt8 = UInt8(colors[0].redComponent * 255)
            let g:UInt8 = UInt8(colors[0].greenComponent * 255)
            let b:UInt8 = UInt8(colors[0].blueComponent * 255)
            let a:UInt8 = UInt8(colors[0].alphaComponent * 255)
            
            outputImageData.append(r)
            outputImageData.append(g)
            outputImageData.append(b)
            outputImageData.append(a)
        }
        
        for i in 1..<channelCount {
            let data = output.toRawBytes(min: Float32(-64), max: Float32(64), channel: i, axes: (0,1,2))!.bytes
            for j in 0..<data.count {
                if data[j] > firstData[j] {
                    firstData[j] = data[j]
                    let r:UInt8 = UInt8(colors[i].redComponent * 255)
                    let g:UInt8 = UInt8(colors[i].greenComponent * 255)
                    let b:UInt8 = UInt8(colors[i].blueComponent * 255)
                    let a:UInt8 = UInt8(colors[i].alphaComponent * 255)
                    
                    outputImageData[j*4] = r
                    outputImageData[j*4+1] = g
                    outputImageData[j*4+2] = b
                    outputImageData[j*4+3] = a
                }
            }
        }
        
        let image = imageFromPixels(pixels: outputImageData, width: 512, height: 512)
        image.writeJPG(toURL: labelURL.deletingLastPathComponent().appendingPathComponent("labels.jpg"))
    }
    
    // I found this function here: https://stackoverflow.com/questions/38590323/obtain-nsimage-from-pixel-array-problems-swift
    func imageFromPixels(pixels: UnsafePointer<UInt8>, width: Int, height: Int)-> NSImage { //No need to pass another CGImage
        let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
        let bitmapInfo:CGBitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedLast.rawValue)
        let bitsPerComponent = 8 //number of bits in UInt8
        let bitsPerPixel = 4 * bitsPerComponent //ARGB uses 4 components
        let bytesPerRow = bitsPerPixel * width / 8 // bitsPerRow / 8 (in some cases, you need some paddings)
        let providerRef = CGDataProvider(
            data: NSData(bytes: pixels, length: height * bytesPerRow) //Do not put `&` as pixels is already an `UnsafePointer`
        )
    
        let cgim = CGImage(
            width: width,
            height: height,
            bitsPerComponent: bitsPerComponent,
            bitsPerPixel: bitsPerPixel,
            bytesPerRow: bytesPerRow, //->not bits
            space: rgbColorSpace,
            bitmapInfo: bitmapInfo,
            provider: providerRef!,
            decode: nil,
            shouldInterpolate: true,
            intent: CGColorRenderingIntent.defaultIntent
        )
        return NSImage(cgImage: cgim!, size: NSSize(width: width, height: height))
    }
    

    【讨论】:

      猜你喜欢
      • 2017-11-11
      • 2020-12-09
      • 2018-05-29
      • 2020-11-08
      • 2020-05-25
      • 2020-05-12
      • 2013-02-23
      • 1970-01-01
      相关资源
      最近更新 更多