【问题标题】:ResNet50 nan loss with Keras 2ResNet50 nan 损失与 Keras 2
【发布时间】:2025-12-07 08:20:03
【问题描述】:

自从升级到 Keras 2 后,我在尝试微调 ResNet50 时看到了 nan 损失。如果我使用单个卷积层(下面注释掉)而不是 resnet,损失和准确性看起来还不错。我是否错过了 Keras 2 改变的一些东西?

from keras.applications.resnet50 import ResNet50
from keras.layers import Flatten, Dense, Input, Conv2D, Activation, Flatten
from keras.layers.pooling import MaxPooling2D
from keras.models import Model
from keras.optimizers import SGD
import numpy as np

inp = Input(batch_shape=(32, 224, 224, 3), name='input_image')

### resnet
modelres = ResNet50(weights="imagenet", include_top=False, input_tensor=inp)
x = modelres.output
x = Flatten()(x)

### single convolutional layer
#x = Conv2D(32, (3,3))(inp)
#x = Activation('relu')(x)
#x = MaxPooling2D(pool_size=(3,3))(x)
#x = Flatten()(x)
#x = Dense(units=32)(x)
predictions = Dense(units=2, kernel_initializer="he_normal", activation="softmax")(x) 

model = Model(inputs=inp, outputs=predictions)
model.compile(SGD(lr=.001, momentum=0.9), "categorical_crossentropy", metrics=["accuracy"])

# generate images of all ones with the same label
def gen():
    while True:
        x_data = np.ones((32,224,224,3)).astype('float32')
        y_data = np.zeros((32,2)).astype('float32')
        y_data[:,1]=1.0
        yield x_data, y_data

model.fit_generator(gen(), 10, validation_data=gen(), validation_steps=1)

model.summary() 的开头和结尾看起来像:

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
input_image (InputLayer)         (32, 224, 224, 3)     0
____________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D) (32, 230, 230, 3)     0
____________________________________________________________________________________________________
conv1 (Conv2D)                   (32, 112, 112, 64)    9472

...

avg_pool (AveragePooling2D)      (32, 1, 1, 2048)      0
____________________________________________________________________________________________________
flatten_1 (Flatten)              (32, 2048)            0
____________________________________________________________________________________________________
dense_1 (Dense)                  (32, 2)               4098
====================================================================================================

训练输出是:

Epoch 1/1
10/10 [==============================] - 30s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00

【问题讨论】:

  • 这个nans 是从开始出现还是在一些批次/时期之后出现?
  • 从头开始
  • 你输入的大小是多少?
  • 训练batch的形状是(32,224,224,3)。
  • 尝试以x 的形式提供np.zeros

标签: keras


【解决方案1】:

当我将后端切换到 tensorflow 而不是 theano 时,一切正常。看起来 keras 2 中有关 theano 实现的问题。

【讨论】: