【发布时间】:2020-04-22 10:08:46
【问题描述】:
我正在训练一个分类模型来对细胞进行分类,我的模型基于这篇论文:https://www.nature.com/articles/s41598-019-50010-9。由于我的数据集仅包含 10 张图像,因此我进行了图像增强以人为地将数据集的大小增加到 3000 张图像,然后将其拆分为 2400 张训练图像和 600 张验证图像。
然而,虽然训练损失和准确度随着迭代次数的增加而提高,但验证损失迅速增加,而验证准确度则停滞在 0.0000e+00。
我的模型从一开始就严重过度拟合吗?
我使用的代码如下图:
import keras
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model, load_model, Sequential, model_from_json, load_model
from tensorflow.keras.layers import Input, BatchNormalization, Activation, Flatten, Dense, LeakyReLU
from tensorflow.python.keras.layers.core import Lambda, Dropout
from tensorflow.python.keras.layers.convolutional import Conv2D, Conv2DTranspose, UpSampling2D
from tensorflow.python.keras.layers.pooling import MaxPooling2D, AveragePooling2D
from tensorflow.python.keras.layers.merge import Concatenate, Add
from tensorflow.keras import backend as K
from tensorflow.keras.callbacks import ModelCheckpoint, LearningRateScheduler, ReduceLROnPlateau, EarlyStopping
from tensorflow.keras.optimizers import *
img_channel = 1
input_size = (512, 512, 1)
inputs = Input(shape = input_size)
initial_input = Lambda(lambda x: x) (inputs) #Ensure input value is between 0 and 1 to avoid negative loss
kernel_size = (3,3)
pad = 'same'
model = Sequential()
filters = 2
c1 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(initial_input)
b1 = BatchNormalization()(c1)
a1 = Activation('elu')(b1)
p1 = AveragePooling2D()(a1)
c2 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(p1)
b2 = BatchNormalization()(c2)
a2 = Activation('elu')(b2)
p2 = AveragePooling2D()(a2)
c3 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(p2)
b3 = BatchNormalization()(c3)
a3 = Activation('elu')(b3)
p3 = AveragePooling2D()(a3)
c4 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(p3)
b4 = BatchNormalization()(c4)
a4 = Activation('elu')(b4)
p4 = AveragePooling2D()(a4)
c5 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(p4)
b5 = BatchNormalization()(c5)
a5 = Activation('elu')(b5)
p5 = AveragePooling2D()(a5)
f = Flatten()(p5)
d1 = Dense(128, activation = 'elu')(f)
d2 = Dense(no_of_img, activation = 'softmax')(d1)
model = Model(inputs = [inputs], outputs = [d2])
print(model.summary())
learning_rate = 0.001
decay_rate = 0.0001
model.compile(optimizer = SGD(lr = learning_rate, decay = decay_rate, momentum = 0.9, nesterov = False),
loss = 'categorical_crossentropy', metrics = ['accuracy'])
perf_lr_scheduler = ReduceLROnPlateau(monitor = 'val_loss', factor = 0.9, patience = 3,
verbose = 1, min_delta = 0.01, min_lr = 0.000001)
model_earlystop = EarlyStopping(monitor = 'val_loss', min_delta = 0.001, patience = 10, restore_best_weights = True)
#Convert labels to binary matrics
img_aug_label = to_categorical(img_aug_label, num_classes = no_of_img)
#Convert images to float to between 0 and 1
img_aug = np.float32(img_aug)/255
plt.imshow(img_aug[0,:,:,0])
plt.show()
#Train on augmented images
model.fit(
img_aug,
img_aug_label,
batch_size = 4,
epochs = 100,
validation_split = 0.2,
shuffle = True,
callbacks = [perf_lr_scheduler],
verbose = 2)
我的模型输出如下图:
Train on 2400 samples, validate on 600 samples
Epoch 1/100
2400/2400 - 12s - loss: 0.6474 - accuracy: 0.8071 - val_loss: 9.8161 - val_accuracy: 0.0000e+00
Epoch 2/100
2400/2400 - 10s - loss: 0.0306 - accuracy: 0.9921 - val_loss: 10.1733 - val_accuracy: 0.0000e+00
Epoch 3/100
2400/2400 - 10s - loss: 0.0058 - accuracy: 0.9996 - val_loss: 10.9820 - val_accuracy: 0.0000e+00
Epoch 4/100
Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0009000000427477062.
2400/2400 - 10s - loss: 0.0019 - accuracy: 1.0000 - val_loss: 11.3029 - val_accuracy: 0.0000e+00
Epoch 5/100
2400/2400 - 10s - loss: 0.0042 - accuracy: 0.9992 - val_loss: 11.9037 - val_accuracy: 0.0000e+00
Epoch 6/100
2400/2400 - 10s - loss: 0.0024 - accuracy: 0.9996 - val_loss: 11.5218 - val_accuracy: 0.0000e+00
Epoch 7/100
Epoch 00007: ReduceLROnPlateau reducing learning rate to 0.0008100000384729356.
2400/2400 - 10s - loss: 9.9053e-04 - accuracy: 1.0000 - val_loss: 11.7658 - val_accuracy: 0.0000e+00
Epoch 8/100
2400/2400 - 10s - loss: 0.0011 - accuracy: 1.0000 - val_loss: 12.0437 - val_accuracy: 0.0000e+00
Epoch 9/100
【问题讨论】:
-
这肯定会在很早的时候过度拟合模型,因为您只有 10 张原始图像(增强后为 3000 张)。但是不知道为什么validation loss很高,val_accuracy为0,能不能把
d2 = Dense(no_of_img, activation = 'softmax')(d1)换成d2 = Dense(no_of_img)(d1),看看结果有没有变化? -
嗨@Janapati,感谢您的评论。我意识到发生错误是因为我在将数据用作模型的训练数据之前没有手动洗牌。我以为validation_split 和shuffle 只会在训练期间发生,但实际上这发生在训练时间之前。在将数据拟合到模型中之前手动打乱数据可以解决问题。
-
可能会将您的发现添加到答案中,以便对其他人有用。谢谢
标签: classification loss