【发布时间】:2020-06-03 19:09:40
【问题描述】:
所以我试图解决 Kaggle 的黑色素瘤比赛,但在尝试运行简单的 keras conv 模型时,我不断收到此错误:
Resource exhausted: OOM when allocating tensor with shape[20,128,1022,1022] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
最初我尝试使用 33k 图像和大约 6 层(我几乎不知道会出现这种错误)。然后我说好吧,因为所有这些图像都是 1024x1024,我要降低层和内部单元以便于计算,但问题仍然存在,我什至无法通过第一个 epoch。
然后我说好的,我要创建一个新目录,其中只有 600 个用于训练的图像和 200 个用于验证的图像(这个问题怎么会持续存在呢??)。那么它继续,我开始意识到问题可能是我的电脑配置。我有 Ubuntu 20,我检查了我的 GPU 是否被使用,实际上每次我在开始终端运行代码时都会说:
Using TensorFlow backend.
2020-06-03 13:48:35.960461: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-03 13:48:35.994757: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-03 13:48:35.995140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 computeCapability: 7.5
coreClock: 1.56GHz coreCount: 16 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 119.24GiB/s
我不知道是不是我配置有问题,我在使用 10k + 图像进行数字识别器和猫狗(这些图像是 28*28 和 256*256)等简单代码时没有问题...
到目前为止,我的代码看起来像这样(在更改模型以便于计算之后):
from keras.layers import Input, Dense, Conv2D, Flatten, MaxPool2D
model = models.Model()
// Capas
input_layer = Input(shape = (1024,1024,3),dtype = 'float32')
conv1 = Conv2D(128,(3,3),activation = 'relu')(input_layer)
maxpool1 = MaxPool2D((2,2))(conv1)
conv2 = Conv2D(128,(3,3),activation = 'relu',dtype = 'float32')(maxpool1)
// All layers below until next comment are commented for taking load from cpu
maxpool2 = MaxPool2D((2,2))(conv2)
conv3 = Conv2D(128,(3,3),activation = 'relu',dtype = 'float32')(maxpool2)
maxpool3 = MaxPool2D((2,2))(conv3)
conv4 = Conv2D(256,(3,3),activation = 'relu',dtype = 'float32')(maxpool3)
maxpool4 = MaxPool2D((2,2))(conv4)
conv5 = Conv2D(256,(3,3),activation = 'relu',dtype = 'float32')(maxpool4)
maxpool5 = MaxPool2D((2,2))(conv5)
conv6 = Conv2D(256,(3,3),activation = 'relu',dtype = 'float32')(maxpool5)
// Here stops the commenting of lines
flatten = Flatten()(conv2)
dense1 = Dense(64,activation = 'relu')(flatten)
output_layer = Dense(1, activation='sigmoid')(dense1)
Generating model
model = models.Model(inputs = input_layer, outputs = output_layer)
from keras import optimizers
model.compile(loss='binary_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])
from keras.preprocessing.image import ImageDataGenerator
All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
# This is the target directory
train_dir,
# All images will be resized to 150x150
target_size=(1024, 1024),
batch_size=20,
# Since we use binary_crossentropy loss, we need binary labels
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
valid_dir,
target_size=(1024, 1024),
batch_size=20,
class_mode='binary')
history = model.fit_generator(
train_generator,
steps_per_epoch=30,
epochs=30,
validation_data=validation_generator,
validation_steps=10)
欢迎任何想法或建议,非常感谢您抽出宝贵时间!
【问题讨论】:
-
你应该缩小图像,1024x1024 太大了。像 256x256 这样的东西更合理。
标签: python machine-learning keras deep-learning data-science