【发布时间】:2019-06-12 21:57:17
【问题描述】:
使用迁移学习我正在尝试使用 Google Colab 在 Keras 中训练 VGG16。以下是笔记本中的代码:(注意:输出写为 cmets)
from keras import Sequential
from keras.layers import Dense, Flatten
from keras.applications import vgg16
from keras.applications.vgg16 import preprocess_input as vgg_pi
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model
base_model = vgg16.VGG16(include_top=False, weights='imagenet',
input_shape=(224, 224, 3))
for layer in base_model.layers:
layer.trainable = False
base_model.summary()
# Total params: 14,714,688
# Trainable params: 0
# Non-trainable params: 14,714,688
x = base_model.output
x = Flatten(name='flatten', input_shape=base_model.output_shape)(x)
x = Dense(10, activation='softmax', name='predictions')(x)
model = Model(inputs=base_model.input, outputs=x)
model.summary()
# Total params: 14,965,578
# Trainable params: 250,890
# Non-trainable params: 14,714,688
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
fill_mode='nearest',
)
validation_datagen = ImageDataGenerator(
rescale=1./255,
)
train_generator = train_datagen.flow_from_directory(
'/content/drive/My Drive/Colab Notebooks/domat/solo-dataset/train/',
target_size=(224, 224),
batch_size=32,
class_mode='categorical',
)
validation_generator = validation_datagen.flow_from_directory(
'/content/drive/My Drive/Colab Notebooks/domat/solo-dataset/validation/',
target_size=(224, 224),
batch_size=32,
class_mode='categorical',
)
# Found 11614 images belonging to 10 classes.
# Found 2884 images belonging to 10 classes.
# check if GPU is running
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
# Found GPU at: /device:GPU:0
t_steps = 11614 // 32
v_steps = 2884 // 32
history = model.fit_generator(train_generator,
epochs=500,
steps_per_epoch=t_steps,
validation_data=validation_generator,
validation_steps=v_steps,
)
# Epoch 1/500
# 8/362 [..............................] - ETA: 41:02 - loss: 2.9058 - acc: 0.2383
因此,出于某种原因,一个 epoch 大约需要 40 分钟,我真的不明白为什么它这么慢。
以前,我使用不同的参数(添加更多完全连接的层),每个 epoch 大约在 3 分钟内完成,尽管显然过度拟合,因为有 1400 万个参数是免费的,而且数据集要小得多。
有人知道如何解决这个问题吗?我已经尝试过一百万件事,但它太慢了。我什至无法回到原始配置来查看我之前在做什么,因此每个 epoch 大约在 3 分钟内完成。
【问题讨论】:
-
您确定环境设置为 GPU 了吗?即使 GPU 存在,您也必须手动将运行时设置为 GPU。
-
是的,我做到了。更奇怪的是,第一个epoch持续一个小时后,另一个只持续几分钟
Epoch 1/5 362/362 [==============================] - 6260s 17s/step - loss: 1.2611 - acc: 0.6735 - val_loss: 0.9555 - val_acc: 0.7712 Epoch 2/5 362/362 [==============================] - 159s 440ms/step - loss: 0.9351 - acc: 0.7800 - val_loss: 1.1295 - val_acc: 0.7903 Epoch 3/5 362/362 [==============================] - 156s 431ms/step - loss: 0.8751 - acc: 0.8033 - val_loss: 0.8300 - val_acc: 0.8219 -
确实很奇怪....
-
你排除了硬盘读取慢的可能吗?
标签: tensorflow keras google-colaboratory