TensorFlow 训练错误答案

【问题标题】：Tensorflow training errorTensorFlow 训练错误
【发布时间】：2017-06-03 03:12:41
【问题描述】：

我正在尝试在另一个基于 pascal VOC 格式的数据集上运行 faster_rcnn。但训练结果是这样的：

经过如下警告后，损失值全部转到nan：

proposal_layer_tf.py:150: RuntimeWarning: 在greater_equal keep = 中遇到无效值 np.where((ws >= min_size) & (hs >= min_size))[0]

这是 proposal_layer_tf.py 第 146-151 行：

def _filter_boxes(boxes, min_size):
    """Remove all boxes with any side smaller than min_size."""
    ws = boxes[:, 2] - boxes[:, 0] + 1
    hs = boxes[:, 3] - boxes[:, 1] + 1
    keep = np.where((ws >= min_size) & (hs >= min_size))[0]
return keep

如您所见，总损失值正在以一种奇怪的方式发生变化，在警告之后它变成了 nan。我该怎么做才能使它正确？

(gpu: Geforce 940m)

【问题讨论】：

尝试降低你的学习率。
我将学习率从 0.001 降低到 0.0001 仍然得到相同的结果。
将学习率降低到 0.00001 也没有帮助。
我现在遇到和你一样的问题。改变学习率并没有帮助。你输入的图片尺寸是多少？无论如何，如果你想通了，请告诉我，如果我想通了，我会告诉你的：P.

标签： python machine-learning tensorflow

【解决方案1】：

问题可能是由您的注释引起的。在 Faster-RCNN 实现中，当他们将边界框加载到数据帧中时，他们将坐标 x1,y1,x2,y2 减去 1 使其从 0 开始。就我而言，我创建了自己的 xml 注释，它们已经是基于 0 的。因此，如果我运行默认的 Faster-RCNN 实现，从 0 中减去 1 会导致下溢错误。所以删除那个减法解决了我的问题。

您可以删除 pascal_voc.py 中的减法或编辑您的注释以使其从 1 开始。如果您选择编辑 pascal_voc.py 文件，请转到此处：

def _load_pascal_annotation(self, index):

    # ...
    # ...
    # ...

    # Load object bounding boxes into a data frame.
    for ix, obj in enumerate(objs):
        bbox = obj.find('bndbox')
        # Make pixel indexes 0-based
        x1 = float(bbox.find('xmin').text) #- 1 <- comment these out
        y1 = float(bbox.find('ymin').text) #- 1
        x2 = float(bbox.find('xmax').text) #- 1
        y2 = float(bbox.find('ymax').text) #- 1

    # ...
    # ...
    # ...

【讨论】：

事实证明这是对的。我在每个循环上打印了每个图像名称，以查看哪个注释文件有问题。这是一些带有截断对象的文件，其坐标大于或小于图像大小。例如，有 x_max = 1033 的对象，而我的图像大小为 1024*1024。我删除了那些对象。