在 Caffe 中训练全连接层时网络不学习答案

【问题标题】：Network not Learning when Training Fully Connected Layers in Caffe在 Caffe 中训练全连接层时网络不学习
【发布时间】：2016-11-24 14:58:41
【问题描述】：

我对 Caffe 很陌生。我正在做的是我有一组用于两个数据集（汽车和鲜花）的特征。

每个图像样本的特征大小为 256-D。
训练集：500 个汽车图像和 1200 个花卉图像
测试集：100 个汽车图像和 200 个花卉图像

基本上，这个问题是一个二元分类问题。我的caffe train.prototxt文件如下：

layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "train.txt"
    batch_size: 40
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "test.txt"
    batch_size: 10
  }
  include {
    phase: TEST
  }
}
layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "data"
  top: "fc1"
  inner_product_param {
    num_output: 256
    weight_filler {
      type: "gaussian"
      std: 1
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "sigmoid1"
  type: "Sigmoid"
  bottom: "fc1"
  top: "sigmoid1"
}
layer {
  name: "fc2"
  type: "InnerProduct"
  bottom: "sigmoid1"
  top: "fc2"
  inner_product_param {
    num_output: 256
    weight_filler {
      type: "gaussian"
      std: 1
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "sigmoid2"
  type: "Sigmoid"
  bottom: "fc2"
  top: "sigmoid2"
}
layer {
  name: "fc3"
  type: "InnerProduct"
  bottom: "sigmoid2"
  top: "fc3"
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "gaussian"
      std: 1
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc3"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc3"
  bottom: "label"
  top: "loss"
}

我正在使用 HDF5 层读取数据，并将其传递给 256-256-2 的 3 个全连接层，激活函数为 sigmoid。（我也改成 ReLU 但结果没变）。

The solver prototxt is: 
 net: "train.prototxt"
test_iter: 100
test_interval: 200
base_lr: 0.010
momentum: 0.9
weight_decay: 0.00005
lr_policy: "inv"
gamma: 0.00001
delta: 1e-8
#test_compute_loss: true
power: 0.75
display: 100
#stepsize: 1000
max_iter: 10000
snapshot: 10000
snapshot_prefix: "sample"
solver_mode: GPU

问题是这种架构不起作用，我认为这是由于网络没有学习任何东西。

该图显示了前 500 次迭代的准确度图，清楚地表明没有任何建设性的事情发生。

为了测试数据集、特征是否正确，我使用 LibSVM 上的特征训练了一个线性 SVM，它以 84% 的准确率工作。

也许我的网络设置不正确，如果有人能帮我解决这个问题，那就太好了。谢谢

--------

更新： 使用 PReLU 得到以下情节。我将 num_output 从 256 减少到 128：

【问题讨论】：

你过拟合了。你有太多的参数和太少的例子。
参数太多是指权重吗？
同样不要使用sigmoid，使用ReLU代替。
@Shai 抱歉，您提到的参数太多，我觉得我有点困惑。除了增加数据集大小之外，您还有其他建议可以帮助提高准确性吗？
对于从头开始的训练，有时最好使用"PReLU" 而不是"ReLU"

标签： c++ machine-learning neural-network deep-learning caffe

【解决方案1】：

确保打乱您的输入数据，使用较小的网络，并可能添加某种正则化，例如 dropout。

【讨论】：

【解决方案2】：

您过度拟合，使用一些预训练的 CNN（例如 CIFAR）并在您的集合上进行微调。

【讨论】：