Tensorflow：如何在语义分割过程中忽略特定标签？答案

【问题标题】：Tensorflow: How to ignore specific labels during semantic segmentation?Tensorflow：如何在语义分割过程中忽略特定标签？
【发布时间】：2017-06-16 18:16:08
【问题描述】：

我正在使用 tensorflow 进行语义分割。在计算像素损失时，如何告诉 tensorflow 忽略特定标签？

我读过in this post，对于图像分类，可以将标签设置为-1，它将被忽略。如果这是真的，给定标签张量，我如何修改我的标签，以便将某些值更改为 -1？

在 Matlab 中是这样的：

ignore_label = 255
myLabelTensor(myLabelTensor == ignore_label) = -1

但我不知道如何在 TF 中做到这一点？

一些背景信息：
这是标签的加载方式：

label_contents = tf.read_file(input_queue[1])
label = tf.image.decode_png(label_contents, channels=1)

这是当前计算损失的方式：

raw_output = net.layers['fc1_voc12']
prediction = tf.reshape(raw_output, [-1, n_classes])
label_proc = prepare_label(label_batch, tf.pack(raw_output.get_shape()[1:3]),n_classes)
gt = tf.reshape(label_proc, [-1, n_classes])

# Pixel-wise softmax loss.
loss = tf.nn.softmax_cross_entropy_with_logits(prediction, gt)
reduced_loss = tf.reduce_mean(loss)

与

def prepare_label(input_batch, new_size, n_classes):
    """Resize masks and perform one-hot encoding.

    Args:
      input_batch: input tensor of shape [batch_size H W 1].
      new_size: a tensor with new height and width.

    Returns:
      Outputs a tensor of shape [batch_size h w 21]
      with last dimension comprised of 0's and 1's only.
    """
    with tf.name_scope('label_encode'):
        input_batch = tf.image.resize_nearest_neighbor(input_batch, new_size) # as labels are integer numbers, need to use NN interp.
        input_batch = tf.squeeze(input_batch, squeeze_dims=[3]) # reducing the channel dimension.
        input_batch = tf.one_hot(input_batch, depth=n_classes)
    return input_batch

我正在使用 tensorflow-deeplab-resnet model，它使用 caffe-tensorflow 将 Caffe 中实现的 Resnet 模型传输到 tensorflow。

【问题讨论】：

TensorFlow: How to handle void labeled data in image segmentation?的可能重复

标签： tensorflow

【解决方案1】：

对不起，我是新手，但我相信https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/faq.md，这里提到需要添加新的数据集。在“segmentation_dataset.py”中，在每个数据集上，您都可以指定 ignore_label。例如，

_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 1464,
        'trainval': 2913,
        'val': 1449,
    },
    num_classes=21,
    ignore_label=255,
)

【讨论】：

【解决方案2】：

根据文档，必须使用labels 上的有效概率分布调用tf.nn.softmax_cross_entropy_with_logits，否则计算将不正确，并且使用带有负标签的tf.nn.sparse_softmax_cross_entropy_with_logits（在您的情况下可能更方便）将要么导致错误，要么返回 NaN 值。我不会依赖它来忽略一些标签。

我要做的是在那些正确类是被忽略的像素中用无穷大替换被忽略类的 logits，因此它们不会对损失造成任何影响：

ignore_label = ...
# Make zeros everywhere except for the ignored label
input_batch_ignored = tf.concat(input_batch.ndims - 1,
    [tf.zeros_like(input_batch[:, :, :, :ignore_label]),
     tf.expand_dims(input_batch[:, :, :, ignore_label], -1),
     tf.zeros_like(input_batch[:, :, :, ignore_label + 1:])])
# Make corresponding logits "infinity" (a big enough number)
predictions_fix = tf.select(input_batch_ignored > 0,
    1e30 * tf.ones_like(predictions), predictions)
# Compute loss with fixed logits
loss = tf.nn.softmax_cross_entropy_with_logits(prediction, gt)

唯一的问题是您考虑到总是正确预测被忽略类别的像素，这意味着包含大量这些像素的图像的损失将人为地变小。根据具体情况，这可能很重要，也可能不重要，但如果您想真正准确，则必须根据未被忽略的像素数对每张图像的损失进行加权，而不是仅取平均值。

# Count relevant pixels on each image
input_batch_relevant = 1 - input_batch_ignored
input_batch_weight = tf.reduce_sum(input_batch_relevant, [1, 2, 3])
# Compute relative weights
input_batch_weight = input_batch_weight / tf.reduce_sum(input_batch_weight)
# Compute reduced loss according to weights
reduced_loss = tf.reduce_sum(loss * input_batch_weight)

【讨论】：

对不起，我不完全理解答案：input_batch_ignored = tf.concat(...) 的输出是什么样的？它似乎与predict (N x H x W x C) 具有相同的形状，在所有通道 (C) 中都具有zeros，除了忽略标签通道。但这意味着我预测图像中所有像素的 ignore_label 类对吗？我想我只需要选择那些具有ignore_label 为gt_label 的像素，对吧？所以我需要像 Matlab 的 (myLabelTensor == ignore_label) 这样的操作来获取这些标签的索引......
@mcExchange 正如你所说，input_batch_ignored 除了被忽略的类之外全为零，input_batch 的那些被保留。这乘以无穷大并添加到对数中，有效地改变了预测，因此被忽略类的像素总是正确的（我现在在想无穷大是否会产生不好的结果，而是应该使用足够大的数字）。这意味着这些像素将为最终成本贡献 0。
@mcExchange 如果你想用 -1 替换被忽略的标签，你可以执行label_wo_ignored = tf.select(label != ignore_label, label, -1 * tf.ones_like(label)) 之类的操作，但我不确定这会给你带来你想要的损失（我的意思是至少不会根据文档）。
啊tf.select(...) 听起来是个不错的选择。不是用-1 替换ignore_label，而是生成变量input_batch_ignored。在上面的示例中，我将在图像的两个空间维度（高度和宽度）上添加 inf 值。相反，我想将预测设置为 inf，仅针对与 ignore_label 对应的那些像素。
@mcExchange 我已经编辑了答案以使用一个大数字而不是无穷大，以防万一它可能会在某些时候破坏计算。