使用 keras VGGFace 框架训练 CNN 时，Epoch 未启动答案

【问题标题】：Epoch does not start while training CNN with keras VGGFace Framework使用 keras VGGFace 框架训练 CNN 时，Epoch 未启动
【发布时间】：2018-11-20 03:15:08
【问题描述】：

我正在尝试在我自己的由 12 类人脸图像组成的数据集上使用 VGG Face implementation with keras framework。我在一些训练集中数据非常少的类上应用了增强。

在使用 resnet50 进行微调后，当我尝试训练我的模型时，它会卡在 epoch 中，即它没有开始训练，而是继续显示 Epoch 1/50。这是它的样子：

Layer (type)                 Output Shape              Param #   
=================================================================
model_1 (Model)              (None, 12)                23585740  
=================================================================
Total params: 23,585,740
Trainable params: 23,532,620
Non-trainable params: 53,120
_________________________________________________________________
Found 1774 images belonging to 12 classes.
Found 313 images belonging to 12 classes.
Epoch 1/50

这是我的代码：

train_data_path = 'dataset_cfps/train'
validation_data_path = 'dataset_cfps/validation'

#Parametres
img_width, img_height = 224, 224

vggface = VGGFace(model='resnet50', include_top=False, input_shape=(img_width, img_height, 3))

#vgg_model = VGGFace(include_top=False, input_shape=(224, 224, 3))
last_layer = vggface.get_layer('avg_pool').output
x = Flatten(name='flatten')(last_layer)
out = Dense(12, activation='sigmoid', name='classifier')(x)
custom_vgg_model = Model(vggface.input, out)


# Create the model
model = models.Sequential()

# Add the convolutional base model
model.add(custom_vgg_model)

# Add new layers
# model.add(layers.Flatten())
# model.add(layers.Dense(1024, activation='relu'))
# model.add(BatchNormalization())
# model.add(layers.Dropout(0.5))
# model.add(layers.Dense(12, activation='sigmoid'))

# Show a summary of the model. Check the number of trainable parameters
model.summary()

train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=20,
      width_shift_range=0.2,
      height_shift_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')

validation_datagen = ImageDataGenerator(rescale=1./255)


train_batchsize = 16
val_batchsize = 16

train_generator = train_datagen.flow_from_directory(
        train_data_path,
        target_size=(img_width, img_height),
        batch_size=train_batchsize,
        class_mode='categorical')

validation_generator = validation_datagen.flow_from_directory(
        validation_data_path,
        target_size=(img_width, img_height),
        batch_size=val_batchsize,
        class_mode='categorical',
        shuffle=True)

# Compile the model
model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.SGD(lr=1e-3),
              metrics=['acc'])
# Train the model
history = model.fit_generator(
      train_generator,
      steps_per_epoch=train_generator.samples/train_generator.batch_size ,
      epochs=50,
      validation_data=validation_generator,
      validation_steps=validation_generator.samples/validation_generator.batch_size,
      verbose=1)

# Save the model
model.save('facenet_resnet.h5')

有谁知道可能是什么问题？我怎样才能让我的模型更好（如果有什么我可以做的）。随时向我提出改进建议。

【问题讨论】：

你等了多久？
半小时吧。
你解决过这个问题吗？
我不得不重新启动一切。不过它确实运行了。

标签： python machine-learning keras transfer-learning

【解决方案1】：

等待没有解决，我通过重启整个程序解决了。

【讨论】：

【解决方案2】：

您只需等待几个小时（基于您的 GPU）。最后它会告诉每个 epoch 的 loss 和 val_loss。

【讨论】：

不正确。 OP 通过重启解决了这个问题。