如何从图像目录中为连体网络创建 CaffeDB 训练数据答案

【问题标题】：How to Create CaffeDB training data for siamese networks out of image directory如何从图像目录中为连体网络创建 CaffeDB 训练数据
【发布时间】：2016-04-26 12:44:40
【问题描述】：

我需要一些帮助才能从包含图像和标签文本文件的普通目录中为连体 CNN 创建 CaffeDB。最好是用 python 方式来做。
问题不在于遍历目录并制作成对的图像。我的问题更多是从这些对中制作 CaffeDB。
到目前为止，我只使用了convert_imageset 从图像目录中创建了一个 CaffeDB。
感谢您的帮助！

【问题讨论】：

你打算使用什么损失层？
我还不知道。对于我的用例，每个类（4 + 垃圾类）都有一些图像（100k），我希望网络能够更好地区分类。使用“正常”的线性 CNN，网络会出现很多错误，我想尝试使用连体 CNN 来让网络更好地学习差异。如果您对好的损失层有一些建议，请告诉我。
对比损失层似乎适合这个用例。
感谢，所以 caffeDB 的问题仍然存在......

标签： neural-network deep-learning caffe conv-neural-network training-data

【解决方案1】：

您为什么不简单地使用旧的 convert_imagest 创建两个数据集？

layer {
  name: "data_a"
  top: "data_a"
  top: "label_a"
  type: "Data"
  data_param { source: "/path/to/first/data_lmdb" }
  ...
}
layer {
  name: "data_b"
  top: "data_b"
  top: "label_b"
  type: "Data"
  data_param { source: "/path/to/second/data_lmdb" }
  ...
}

至于损失，由于每个示例都有一个类标签，您需要将label_a 和label_b 转换为same_not_same_label。我建议您使用 python 层“即时”执行此操作。在prototxt添加对python层的调用：

layer {
  name: "a_b_to_same_not_same_label"
  type: "Python"
  bottom: "label_a"
  bottom: "label_b"
  top: "same_not_same_label"
  python_param { 
    # the module name -- usually the filename -- that needs to be in $PYTHONPATH
    module: "siamese"
    # the layer name -- the class name in the module
    layer: "SiameseLabels"
  }
  propagate_down: false
}

创建siamese.py（确保它在您的$PYTHONPATH 中）。在siamese.py 你应该有图层类：

import sys, os
sys.path.insert(0,os.environ['CAFFE_ROOT'] + '/python')
import caffe
class SiameseLabels(caffe.Layer):
  def setup(self, bottom, top):
    if len(bottom) != 2:
       raise Exception('must have exactly two inputs')
    if len(top) != 1:
       raise Exception('must have exactly one output')
  def reshape(self,bottom,top):
    top[0].reshape( *bottom[0].shape )
  def forward(self,bottom,top):
    top[0].data[...] = (bottom[0].data == bottom[1].data).astype('f4')
  def backward(self,top,propagate_down,bottom):
      # no back prop
      pass

确保您以不同的方式对两组中的示例进行洗牌，这样您就可以得到非平凡的配对。此外，如果您使用不同个示例构建第一个和第二个数据集，那么您将在每个 epoch 看到不同的对 ;)

确保您构建网络以共享重复层的权重，有关详细信息，请参阅this tutorial。

【讨论】：

我在 caffe/python 和 python2.7 安装目录中都没有找到 siamese.py 文件。我正在开发 Ubuntu 15.04，并在 10/2015 获得了 caffe-master 分支。只有 mnist siamese 示例，并且我已经像教程中那样使用共享参数设计了网络，只是数据输入的开头对我来说不是很清楚。到目前为止，我还没有使用 python 层。我只是为给定的solver.prototxt定义网络并使用train命令运行caffe。喜欢：caffe train -solver solver.prototxt -gpu all。我的数据层是指带有 *.mdb 和平均 binaryproto 文件的目录
@Feuerteufel 您需要创建一个siamese.py 文件并确保它在您的$PYTHONPATH 中。此文件应包含问题中的代码（以及import caffe 所需的正确imports）。如果你在Makefile 中启用了一个 python 层，那么 caffe 会为你运行 python 代码作为其caffe train 的一部分。
好的，python 层没有启用，所以我现在正在重建它。 siamese.py 的正确导入行是“import sys”、“sys.path.insert(0, 'path/to/caffe/python')”和“import caffe”还是更多？在损失层中，same_not_same_label 然后用作第三个输入？
@Feuerteufel same_not_same_label 用作对比损失的标签。
如果我有 N 个标签。我如何强制在对比损失层之前大小为 N 的特征向量代表每个类的某种概率？还是连体网设计自动自带？