【问题标题】:Finetuning VGG-16 Slow training in Keras在 Keras 中微调 VGG-16 慢速训练
【发布时间】:2017-04-07 07:08:19
【问题描述】:

我正在尝试使用 LFW 数据集对 VGG 模型的最后两层进行微调,我通过删除原始层并在我的案例中添加了具有 19 个输出的 softmax 层来更改了 softmax 层尺寸,因为有 19 个类我正在努力训练。 我还想微调最后一个全连接层,以制作“自定义特征提取器”

我正在设置我希望这样不可训练的层:

for layer in model.layers:
    layer.trainable = False

使用 gpu,我每个 epoch 大约需要 1 小时来训练 19 个类,每个类至少 40 个图像。

由于我没有很多样本,所以这种训练表现有点奇怪。

有人知道为什么会这样吗?

这里是日志:

Image shape:  (224, 224, 3)
Number of classes:  19
K.image_dim_ordering: th

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 3, 224, 224)   0                                            
____________________________________________________________________________________________________
conv1_1 (Convolution2D)          (None, 64, 224, 224)  1792        input_1[0][0]                    
____________________________________________________________________________________________________
conv1_2 (Convolution2D)          (None, 64, 224, 224)  36928       conv1_1[0][0]                    
____________________________________________________________________________________________________
pool1 (MaxPooling2D)             (None, 64, 112, 112)  0           conv1_2[0][0]                    
____________________________________________________________________________________________________
conv2_1 (Convolution2D)          (None, 128, 112, 112) 73856       pool1[0][0]                      
____________________________________________________________________________________________________
conv2_2 (Convolution2D)          (None, 128, 112, 112) 147584      conv2_1[0][0]                    
____________________________________________________________________________________________________
pool2 (MaxPooling2D)             (None, 128, 56, 56)   0           conv2_2[0][0]                    
____________________________________________________________________________________________________
conv3_1 (Convolution2D)          (None, 256, 56, 56)   295168      pool2[0][0]                      
____________________________________________________________________________________________________
conv3_2 (Convolution2D)          (None, 256, 56, 56)   590080      conv3_1[0][0]                    
____________________________________________________________________________________________________
conv3_3 (Convolution2D)          (None, 256, 56, 56)   590080      conv3_2[0][0]                    
____________________________________________________________________________________________________
pool3 (MaxPooling2D)             (None, 256, 28, 28)   0           conv3_3[0][0]                    
____________________________________________________________________________________________________
conv4_1 (Convolution2D)          (None, 512, 28, 28)   1180160     pool3[0][0]                      
____________________________________________________________________________________________________
conv4_2 (Convolution2D)          (None, 512, 28, 28)   2359808     conv4_1[0][0]                    
____________________________________________________________________________________________________
conv4_3 (Convolution2D)          (None, 512, 28, 28)   2359808     conv4_2[0][0]                    
____________________________________________________________________________________________________
pool4 (MaxPooling2D)             (None, 512, 14, 14)   0           conv4_3[0][0]                    
____________________________________________________________________________________________________
conv5_1 (Convolution2D)          (None, 512, 14, 14)   2359808     pool4[0][0]                      
____________________________________________________________________________________________________
conv5_2 (Convolution2D)          (None, 512, 14, 14)   2359808     conv5_1[0][0]                    
____________________________________________________________________________________________________
conv5_3 (Convolution2D)          (None, 512, 14, 14)   2359808     conv5_2[0][0]                    
____________________________________________________________________________________________________
pool5 (MaxPooling2D)             (None, 512, 7, 7)     0           conv5_3[0][0]                    
____________________________________________________________________________________________________
flatten (Flatten)                (None, 25088)         0           pool5[0][0]                      
____________________________________________________________________________________________________
fc6 (Dense)                      (None, 4096)          102764544   flatten[0][0]                    
____________________________________________________________________________________________________
fc7 (Dense)                      (None, 4096)          16781312    fc6[0][0]                        
____________________________________________________________________________________________________
batchnormalization_1 (BatchNorma (None, 4096)          16384       fc7[0][0]                        
____________________________________________________________________________________________________
fc8 (Dense)                      (None, 19)            77843       batchnormalization_1[0][0]       
====================================================================================================
Total params: 134,354,771
Trainable params: 16,867,347
Non-trainable params: 117,487,424
____________________________________________________________________________________________________
None
Train on 1120 samples, validate on 747 samples
Epoch 1/20
1120/1120 [==============================] - 7354s - loss: 2.9517 - acc: 0.0714 - val_loss: 2.9323 - val_acc: 0.2316
Epoch 2/20
1120/1120 [==============================] - 7356s - loss: 2.8053 - acc: 0.1732 - val_loss: 2.9187 - val_acc: 0.3614
Epoch 3/20
1120/1120 [==============================] - 7358s - loss: 2.6727 - acc: 0.2643 - val_loss: 2.9034 - val_acc: 0.3882
Epoch 4/20
1120/1120 [==============================] - 7361s - loss: 2.5565 - acc: 0.3071 - val_loss: 2.8861 - val_acc: 0.4016
Epoch 5/20
1120/1120 [==============================] - 7360s - loss: 2.4597 - acc: 0.3518 - val_loss: 2.8667 - val_acc: 0.4043
Epoch 6/20
1120/1120 [==============================] - 7363s - loss: 2.3827 - acc: 0.3714 - val_loss: 2.8448 - val_acc: 0.4163
Epoch 7/20
1120/1120 [==============================] - 7364s - loss: 2.3108 - acc: 0.4045 - val_loss: 2.8196 - val_acc: 0.4244
Epoch 8/20
1120/1120 [==============================] - 7377s - loss: 2.2463 - acc: 0.4268 - val_loss: 2.7905 - val_acc: 0.4324
Epoch 9/20
1120/1120 [==============================] - 7373s - loss: 2.1824 - acc: 0.4563 - val_loss: 2.7572 - val_acc: 0.4404
Epoch 10/20
1120/1120 [==============================] - 7373s - loss: 2.1313 - acc: 0.4732 - val_loss: 2.7190 - val_acc: 0.4471
Epoch 11/20
1120/1120 [==============================] - 7440s - loss: 2.0766 - acc: 0.5036 - val_loss: 2.6754 - val_acc: 0.4565
Epoch 12/20
1120/1120 [==============================] - 7414s - loss: 2.0323 - acc: 0.5170 - val_loss: 2.6263 - val_acc: 0.4565
Epoch 13/20
1120/1120 [==============================] - 7413s - loss: 1.9840 - acc: 0.5420 - val_loss: 2.5719 - val_acc: 0.4592
Epoch 14/20
1120/1120 [==============================] - 7414s - loss: 1.9467 - acc: 0.5464 - val_loss: 2.5130 - val_acc: 0.4592
Epoch 15/20
1120/1120 [==============================] - 7412s - loss: 1.9039 - acc: 0.5652 - val_loss: 2.4513 - val_acc: 0.4592
Epoch 16/20
1120/1120 [==============================] - 7413s - loss: 1.8716 - acc: 0.5723 - val_loss: 2.3906 - val_acc: 0.4578
Epoch 17/20
1120/1120 [==============================] - 7415s - loss: 1.8214 - acc: 0.5866 - val_loss: 2.3319 - val_acc: 0.4538
Epoch 18/20
1120/1120 [==============================] - 7416s - loss: 1.7860 - acc: 0.5982 - val_loss: 2.2789 - val_acc: 0.4538
Epoch 19/20
1120/1120 [==============================] - 7430s - loss: 1.7623 - acc: 0.5973 - val_loss: 2.2322 - val_acc: 0.4538
Epoch 20/20
1120/1120 [==============================] - 7856s - loss: 1.7222 - acc: 0.6170 - val_loss: 2.1913 - val_acc: 0.4538
Accuracy: 45.38%

结果不好,因为我无法训练它以获得更多数据,因为它需要太长时间。有什么想法吗?

【问题讨论】:

  • 沉迷于“Marcin Możejko” - 接下来呢: 1. 移除顶部(密集)层。 2. 为您的图像计算网络输出(因此您将拥有 19*40 个向量)。 3. 在这个向量上训练你的新密集部分。 4.结合这2个网络(CNN和Dense)(无论如何请注意,它可能不会给出太好的结果)。
  • 我想了想,你的想法是从图像中提取特征,然后用这些特征训练顺序密集层?
  • 是的。只需从图像中提取特征向量并训练密集层。也许你会得到一个可以接受的结果。
  • 好的,我明天试试,我会告诉你的
  • 仍然很慢,但可以。我的准确率达到 80%,损失 1.9,验证了 20 个 epoch,所以也许我需要为每个类提供更多数据......

标签: tensorflow keras computer-vision conv-neural-network vgg-net


【解决方案1】:

请注意,您要输入 ~ 19 * 40 < 800 示例以训练 16,867,347 参数。所以这基本上是每个示例的2e6 参数。这根本无法正常工作。尝试删除所有FCN 层(Dense 层在顶部)并使用例如较小的Dense每个约 50 个神经元。在我看来,这应该可以帮助您提高准确性并加快培训速度。

【讨论】:

  • 是的,我知道,我已经尝试过了,但性能很差,比如验证准确度冻结在 20%,每类至少有 20 张图像。所以我打算更改我的数据集,因为 LFW 有很多只有一张图像的类,所以也许如果我采用每个类有更多表示的 faceScrub,它会更好地与原始 VGG 一起使用,并且显然至少需要 100 个类每班 200 张图片,即......你怎么看?谢谢!!
  • 你觉得计算时间怎么样?
  • 我更改了数据集(现在我正在使用 faceScrub)并且我已经尝试了你提出的方法,每个有 2 个密集的 128 个神经元,但它仍然很慢。我认为这来自卷积层,因为我的图像尺寸是 224*224。我现在的结果是val_loss: 2.4294 - val_acc: 0.8350 分类 50 个类,每个类 23 个图像,我应该获取更多数据吗?损失函数下降很慢
  • 你的batch_size是什么?
  • 我已经安装了 tensorflow 的两个版本(gpu 和 cpu),并且对于默认的 keras 采用 cpu 版本,所以我删除了这个版本并且它工作了....谢谢!!! :)
猜你喜欢
  • 2018-08-03
  • 2016-11-16
  • 2018-11-26
  • 2017-06-05
  • 2017-09-28
  • 1970-01-01
  • 2019-08-01
  • 2019-01-31
  • 2017-06-12
相关资源
最近更新 更多