解读 Caffe 模型答案

【问题标题】：Interpreting Caffe models解读 Caffe 模型
【发布时间】：2016-03-04 23:16:00
【问题描述】：

我正在尝试解释和理解用 Caffe .proto 编写的模型。

昨天我在here 中看到Shai 的示例'deploy.prototxt'，引用如下：

layer {
   name: "ip1_a"
   bottom: "data_a"
   top: "ip1_a"
   type: "InnerProduct"
   inner_product_param {
     num_output: 10
   }
   param {
     name: "ip1_w"  # NOTE THIS NAME!
     lr_mult: 1
   }
   param {
     name: "ip1_b"
     lr_mult: 2
   }
 }
 layer {
   name: "ip1_b"
   bottom: "data_b"
   top: "ip1_b"
   type: "InnerProduct"
   inner_product_param {
     num_output: 10
   }
   param {
     name: "ip1_w"  # NOTE THIS NAME: it's the same!
     lr_mult: 10 # different LR for this branch
   }
   param {
     name: "ip1_b"
     lr_mult: 20
   }
 }
 # one layer to combine them     
 layer {
   type: "Concat"
   bottom: "ip1_a"
   bottom: "ip1_b"
   top: "ip1_combine"
   name: "concat"
 }
 layer {
   name: "joint_ip"
   type: "InnerProduct"
   bottom: "ip1_combine"
   top: "joint_ip"
   inner_product_param {
     num_output: 30
   }
 }

我将此模型定义理解为：

     data_a         data_b
        |             |
        |             |
     -------       -------   
    | ip1_a |     | ip1_b |
     -------       -------
        |             |
        |             |
      ip1_a         ip1_b
        |             |
        |             |
        V             V
        ~~~~~~~~~~~~~~~
               |
               |
               V
         ------------- 
        |    concat   |
         ------------- 
               |
               |
         ip1_combine
               |
               |
         ------------- 
        |   joint_ip  |
         ------------- 
               |
               |
            joint_ip

blob ip1_a 由层ip1_a 训练，权重用ip1_w(lr:1) 初始化，偏差用ip1_b(lr:2) 初始化。 blob ip1_a 实际上是用ip1_w 初始化的新学习权重。习得性偏见没有名字。

在一些模型中，我们可以发现一些层有：

lr_mult:1
lr_mult:2

lr_mult 的第一个实例始终对应于权重，而下一个实例对应于偏差。

我上面的理解正确吗？

【问题讨论】：

标签： machine-learning neural-network deep-learning caffe

【解决方案1】：

您正在混合两种数据类型：输入（训练）数据和网络参数。
在训练期间，输入数据被固定到一个已知的训练/验证集，只有网络参数被改变。相反，在部署网络时，数据会更改为新图像，而网络参数是固定的。有关 caffe 存储这两种类型数据的方式的一些深入描述，请参阅this answer。

在您展示的示例中，有两个输入训练数据路径：data_a 和 data_b，每次可能不同图像。输入 blob 通过 InnerProduct 层分别成为 ip1_a 和 ip1_b blob。然后将它们连接成单个 blob ip1_combined，然后将其馈入最终的 InnerProduct 层。

另一方面，模型有一组参数：第一个内积层的ip1_w 和ip1_b（权重和偏差）。在这个特定示例中，层的参数被明确命名以表明它们在ip1_a 和ip1_b 层之间共享。

至于两个lr_mult：那么是的，第一个是权重的LR乘数，第二个是偏置项。

【讨论】：

谢谢你，Shai！还有一个问题，学习率是在反向传播期间应用的。因此，在反向传播时，ip1_w 在 ip1_a 层中用 lr1 更新，而这个 UPDATED ip1_w 被 ip1_b 层使用它的 lr 再次更新？
@AnoopK.Prabhu 在这种情况下，反向传播为参数ip1_w 和ip1_b 提供了两次更新，一次是通过ip1_a，一次是通过ip1_b，因为在这种特殊情况下层参数的内部表示链接到相同的参数。至于为什么这些更新使用不同的 LR - 你必须询问 the original question 的 OP。