【问题标题】:Weird loss pattern when using two losses in caffe在 caffe 中使用两个损失时的奇怪损失模式
【发布时间】:2017-04-25 15:17:08
【问题描述】:

我正在用 caffe 训练一个 CNN,并收到以下奇怪的损失模式:

I0425 16:38:58.305482 23335 solver.cpp:398]     Test net output #0: loss = nan (* 1 = nan loss)
I0425 16:38:58.305524 23335 solver.cpp:398]     Test net output #1: loss_intermediate = inf (* 1 = inf loss)
I0425 16:38:59.235857 23335 solver.cpp:219] Iteration 0 (-4.2039e-45 iter/s, 20.0094s/50 iters), loss = 18284.4
I0425 16:38:59.235926 23335 solver.cpp:238]     Train net output #0: loss = 18274.9 (* 1 = 18274.9 loss)
I0425 16:38:59.235942 23335 solver.cpp:238]     Train net output #1: loss_intermediate = 9.46859 (* 1 = 9.46859 loss)
I0425 16:38:59.235955 23335 sgd_solver.cpp:105] Iteration 0, lr = 1e-06
I0425 16:39:39.330327 23335 solver.cpp:219] Iteration 50 (1.24704 iter/s, 40.0948s/50 iters), loss = 121737
I0425 16:39:39.330410 23335 solver.cpp:238]     Train net output #0: loss = 569.695 (* 1 = 569.695 loss)
I0425 16:39:39.330425 23335 solver.cpp:238]     Train net output #1: loss_intermediate = 121168 (* 1 = 121168 loss)
I0425 16:39:39.330433 23335 sgd_solver.cpp:105] Iteration 50, lr = 1e-06
I0425 16:40:19.372197 23335 solver.cpp:219] Iteration 100 (1.24868 iter/s, 40.0421s/50 iters), loss = 34088.4
I0425 16:40:19.372268 23335 solver.cpp:238]     Train net output #0: loss = 369.577 (* 1 = 369.577 loss)
I0425 16:40:19.372283 23335 solver.cpp:238]     Train net output #1: loss_intermediate = 33718.8 (* 1 = 33718.8 loss)
I0425 16:40:19.372292 23335 sgd_solver.cpp:105] Iteration 100, lr = 1e-06
I0425 16:40:59.501541 23335 solver.cpp:219] Iteration 150 (1.24596 iter/s, 40.1297s/50 iters), loss = 21599.6
I0425 16:40:59.501606 23335 solver.cpp:238]     Train net output #0: loss = 478.262 (* 1 = 478.262 loss)
I0425 16:40:59.501621 23335 solver.cpp:238]     Train net output #1: loss_intermediate = 21121.3 (* 1 = 21121.3 loss)
...
I0425 17:09:01.895849 23335 solver.cpp:219] Iteration 2200 (1.24823 iter/s, 40.0568s/50 iters), loss = 581.874
I0425 17:09:01.895912 23335 solver.cpp:238]     Train net output #0: loss = 532.049 (* 1 = 532.049 loss)
I0425 17:09:01.895926 23335 solver.cpp:238]     Train net output #1: loss_intermediate = 49.8377 (* 1 = 49.8377 loss)
I0425 17:09:01.895936 23335 sgd_solver.cpp:105] Iteration 2200, lr = 1e-06

仅供参考:我的网络基本上由两个阶段组成,因此我有两个损失。第一阶段可以看作是粗略阶段,第二阶段是粗略阶段的上采样阶段。

我的问题是:这是典型的损失模式吗?首先 loss 值在第一次迭代中很高,而 intermediate_loss 在第一次迭代中很低,然后在接下来的迭代中基本上会转过来,所以 loss 是较低且 intermediate_loss 较高。最后只有 intermediate_loss 收敛。

【问题讨论】:

  • 你在使用 BatchNorm 层吗?
  • 是的!但是我确实在训练时将global_stats设置为false,在测试@Shai时设置为true也许我应该先训练第一个网络阶段,然后将第一阶段的所有lr_param设置为0。

标签: deep-learning caffe conv-neural-network


【解决方案1】:

“典型”并不是一个真正适用的术语。有各种各样的模型和拓扑,你可以找到许多奇怪的损失进程的例子。

在您的情况下,中间损失很可能一开始就很低,因为它“不知道更好”。随着后面的层得到足够的训练,可以向中间层提供可靠的反馈,然后它开始学习到足以犯严重错误的程度。

最终的损失计算与ground truth直接相关;它从第一次迭代中学习,因此从高损失到低损失的进展更容易理解。

【讨论】:

    猜你喜欢
    • 2018-06-25
    • 2015-10-28
    • 1970-01-01
    • 2017-03-14
    • 2018-07-26
    • 2016-08-01
    • 2020-07-10
    • 2019-04-20
    • 2017-02-20
    相关资源
    最近更新 更多