【发布时间】:2018-09-02 16:13:43
【问题描述】:
我正在尝试使用带有 TensorFlow 后端的 keras 来预测图像中的特征。具体来说,我正在尝试使用 keras ImageDataGenerator。该模型设置为运行 4 个 epoch,并且运行良好,直到第 4 个 epoch 失败并出现 MemoryError。
我在运行 Ubuntu Server 16.04 LTS (HVM)、SSD 卷类型的 AWS g2.2xlarge 实例上运行此模型。
训练图像是 256x256 RGB 像素瓦片(8 位无符号),训练掩码是 256x256 单波段(8 位无符号)瓦片数据,其中 255 == 感兴趣的特征,0 == 其他所有特征。
以下 3 个函数与此错误有关。
如何解决这个 MemoryError?
def train_model():
batch_size = 1
training_imgs = np.lib.format.open_memmap(filename=os.path.join(data_path, 'data.npy'),mode='r+')
training_masks = np.lib.format.open_memmap(filename=os.path.join(data_path, 'mask.npy'),mode='r+')
dl_model = create_model()
print(dl_model.summary())
model_checkpoint = ModelCheckpoint(os.path.join(data_path,'mod_weight.hdf5'), monitor='loss',verbose=1, save_best_only=True)
dl_model.fit_generator(generator(training_imgs, training_masks, batch_size), steps_per_epoch=(len(training_imgs)/batch_size), epochs=4,verbose=1,callbacks=[model_checkpoint])
def generator(train_imgs, train_masks=None, batch_size=None):
# Create empty arrays to contain batch of features and labels#
if train_masks is not None:
train_imgs_batch = np.zeros((batch_size,y_to_res,x_to_res,bands))
train_masks_batch = np.zeros((batch_size,y_to_res,x_to_res,1))
while True:
for i in range(batch_size):
# choose random index in features
index= random.choice(range(len(train_imgs)))
train_imgs_batch[i] = train_imgs[index]
train_masks_batch[i] = train_masks[index]
yield train_imgs_batch, train_masks_batch
else:
rec_imgs_batch = np.zeros((batch_size,y_to_res,x_to_res,bands))
while True:
for i in range(batch_size):
# choose random index in features
index= random.choice(range(len(train_imgs)))
rec_imgs_batch[i] = train_imgs[index]
yield rec_imgs_batch
def train_generator(train_images,train_masks,batch_size):
data_gen_args=dict(rotation_range=90.,horizontal_flip=True,vertical_flip=True,rescale=1./255)
image_datagen = ImageDataGenerator()
mask_datagen = ImageDataGenerator()
# # Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
image_datagen.fit(train_images, augment=True, seed=seed)
mask_datagen.fit(train_masks, augment=True, seed=seed)
image_generator = image_datagen.flow(train_images,batch_size=batch_size)
mask_generator = mask_datagen.flow(train_masks,batch_size=batch_size)
return zip(image_generator, mask_generator)
以下是模型的输出,详细说明了时期和错误消息:
Epoch 00001: loss improved from inf to 0.01683, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5
Epoch 2/4
7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0049 - binary_crossentropy: 0.0027 - jaccard_coef_int: 0.9983
Epoch 00002: loss improved from 0.01683 to 0.00492, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5
Epoch 3/4
7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0049 - binary_crossentropy: 0.0026 - jaccard_coef_int: 0.9982
Epoch 00003: loss improved from 0.00492 to 0.00488, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5
Epoch 4/4
7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0074 - binary_crossentropy: 0.0042 - jaccard_coef_int: 0.9975
Epoch 00004: loss did not improve
Traceback (most recent call last):
File "image_rec.py", line 291, in <module>
train_model()
File "image_rec.py", line 208, in train_model
dl_model.fit_generator(train_generator(training_imgs,training_masks,batch_size),steps_per_epoch=1,epochs=1,workers=1)
File "image_rec.py", line 274, in train_generator
image_datagen.fit(train_images, augment=True, seed=seed)
File "/home/ubuntu/pyvirt_test/local/lib/python2.7/site-packages/keras/preprocessing/image.py", line 753, in fit
x = np.copy(x)
File "/home/ubuntu/pyvirt_test/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1505, in copy
return array(a, order=order, copy=True)
MemoryError
【问题讨论】:
-
你能偶然清理你的代码吗?您有部分代码被注释掉,并且错误似乎引用了不在您包含的 sn-p 中的代码(我在您的代码中的任何地方都没有看到对
train_generator的调用,但这是错误位置)。使用您提供的内容可能很难进行调试。 -
即使您更改随机种子,它是否总是在第 4 个 epoch 发生?我看到你的批量大小是 1,所以每次它只适合 1 张图像进行训练,你的一张图像尺寸非常大并且 np.copy() 会导致内存错误吗?
-
@BinhNguyenLe 它仍然有一个内存错误,纪元较少。我还没有尝试更改随机种子。
-
你考虑过更新 tensorflow 和 keras 吗?
标签: python tensorflow deep-learning keras