【问题标题】:Deep learning: big difference between training and validation loss from the very first epoch on深度学习:从第一个 epoch 开始,训练和验证损失之间的巨大差异
【发布时间】:2018-12-04 00:04:31
【问题描述】:

我的神经网络是本文提出的模型的略微修改版本:https://arxiv.org/pdf/1606.01781.pdf

我的目标是将文本分类为 9 个不同的类别。我正在使用 29 个卷积层,并将任何文本的最大长度设置为 256 个字符。

训练数据有 900k 个样本,验证数据有 35k 个样本。数据非常不平衡,因此我做了一些数据扩充来平衡训练数据(显然没有触及验证数据),然后在训练中使用类权重。

Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 256)               0         
_________________________________________________________________
embedding_1 (Embedding)      (None, 256, 16)           1152      
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 256, 64)           3136      
_________________________________________________________________
sequential_1 (Sequential)    (None, 256, 64)           25216     
_________________________________________________________________
sequential_2 (Sequential)    (None, 256, 64)           25216     
_________________________________________________________________
sequential_3 (Sequential)    (None, 256, 64)           25216     
_________________________________________________________________
sequential_4 (Sequential)    (None, 256, 64)           25216     
_________________________________________________________________
sequential_5 (Sequential)    (None, 256, 64)           25216     
_________________________________________________________________
sequential_6 (Sequential)    (None, 256, 64)           25216     
_________________________________________________________________
sequential_7 (Sequential)    (None, 256, 64)           25216     
_________________________________________________________________
sequential_8 (Sequential)    (None, 256, 64)           25216     
_________________________________________________________________
sequential_9 (Sequential)    (None, 256, 64)           25216     
_________________________________________________________________
sequential_10 (Sequential)   (None, 256, 64)           25216     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 128, 64)           0         
_________________________________________________________________
sequential_11 (Sequential)   (None, 128, 128)          75008     
_________________________________________________________________
sequential_12 (Sequential)   (None, 128, 128)          99584     
_________________________________________________________________
sequential_13 (Sequential)   (None, 128, 128)          99584     
_________________________________________________________________
sequential_14 (Sequential)   (None, 128, 128)          99584     
_________________________________________________________________
sequential_15 (Sequential)   (None, 128, 128)          99584     
_________________________________________________________________
sequential_16 (Sequential)   (None, 128, 128)          99584     
_________________________________________________________________
sequential_17 (Sequential)   (None, 128, 128)          99584     
_________________________________________________________________
sequential_18 (Sequential)   (None, 128, 128)          99584     
_________________________________________________________________
sequential_19 (Sequential)   (None, 128, 128)          99584     
_________________________________________________________________
sequential_20 (Sequential)   (None, 128, 128)          99584     
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 64, 128)           0         
_________________________________________________________________
sequential_21 (Sequential)   (None, 64, 256)           297472    
_________________________________________________________________
sequential_22 (Sequential)   (None, 64, 256)           395776    
_________________________________________________________________
sequential_23 (Sequential)   (None, 64, 256)           395776    
_________________________________________________________________
sequential_24 (Sequential)   (None, 64, 256)           395776    
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 32, 256)           0         
_________________________________________________________________
sequential_25 (Sequential)   (None, 32, 512)           1184768   
_________________________________________________________________
sequential_26 (Sequential)   (None, 32, 512)           1577984   
_________________________________________________________________
sequential_27 (Sequential)   (None, 32, 512)           1577984   
_________________________________________________________________
sequential_28 (Sequential)   (None, 32, 512)           1577984   
_________________________________________________________________
lambda_1 (Lambda)            (None, 4096)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 2048)              8390656   
_________________________________________________________________
dense_2 (Dense)              (None, 2048)              4196352   
_________________________________________________________________
dense_3 (Dense)              (None, 9)                 18441     
=================================================================
Total params: 21,236,681
Trainable params: 21,216,713
Non-trainable params: 19,968

对于给定的模型,我收到以下结果:

对我来说,损失曲线看起来很奇怪,因为我无法发现曲线上典型的过拟合效应,但训练损失和验证损失之间的差异仍然很大。此外,epoch#1 的训练损失远低于任何时期的验证损失。

这是我应该担心的事情吗?如何改进我的模型?

谢谢!

【问题讨论】:

  • 仅供参考,这个差距被称为泛化错误:)

标签: tensorflow neural-network keras deep-learning convolution


【解决方案1】:

为了缩小训练和验证错误之间的差距,我建议两件事:

  • 如果可以找到,请尝试使用更多数据
  • 在密集层之间使用 dropout 层进行正则化

【讨论】:

    猜你喜欢
    • 2018-06-21
    • 2020-09-04
    • 2019-01-12
    • 1970-01-01
    • 2019-02-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-04-13
    相关资源
    最近更新 更多