预处理使用 keras 函数 ImageDataGenerator() 生成的图像以训练 resnet50 模型答案

【问题标题】：preprocessing images generated using keras function ImageDataGenerator() to train resnet50 model预处理使用 keras 函数 ImageDataGenerator() 生成的图像以训练 resnet50 模型
【发布时间】：2018-10-12 11:44:31
【问题描述】：

我正在尝试针对图像分类问题训练 resnet50 模型。在我拥有的图像数据集上训练模型之前，我已经加载了“imagenet”预训练权重。我正在使用 keras 函数 flow_from_directory() 从目录加载图像。

train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(
        './train_qcut_2_classes',
        batch_size=batch_size,
        shuffle=True,
        target_size=input_size[1:],
        class_mode='categorical')  
test_datagen = ImageDataGenerator()
validation_generator = test_datagen.flow_from_directory(
        './validate_qcut_2_classes',
        batch_size=batch_size,
        target_size=input_size[1:],
        shuffle=True,
        class_mode='categorical')

我将生成器作为参数传递给 fit_generator 函数。

hist2=model.fit_generator(train_generator,
                        samples_per_epoch=102204,
                        validation_data=validation_generator,
                        nb_val_samples=25547,
                        nb_epoch=80, callbacks=callbacks,
                        verbose=1)

问题：

通过此设置，我如何使用 preprocess_input() 函数在将输入图像传递给模型之前对其进行预处理？

from keras.applications.resnet50 import preprocess_input

我尝试使用如下 preprocessing_function 参数

train_datagen=ImageDataGenerator(preprocessing_function=preprocess_input)
train_generator = train_datagen.flow_from_directory(
        './train_qcut_2_classes',
        batch_size=batch_size,
        shuffle=True,
        target_size=input_size[1:],
        class_mode='categorical')  
test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
validation_generator = test_datagen.flow_from_directory(
        './validate_qcut_2_classes',
        batch_size=batch_size,
        target_size=input_size[1:],
        shuffle=True,
        class_mode='categorical')

当我尝试提取预处理后的输出时，我得到了以下结果。

train_generator.next()[0][0]

array([[[  91.06099701,   80.06099701,   96.06099701, ...,   86.06099701,
       52.06099701,   12.06099701],
    [ 101.06099701,  104.06099701,  118.06099701, ...,  101.06099701,
       63.06099701,   19.06099701],
    [ 117.06099701,  103.06099701,   88.06099701, ...,   88.06099701,
       74.06099701,   18.06099701],
    ..., 
    [-103.93900299, -103.93900299, -103.93900299, ...,  -24.93900299,
      -38.93900299,  -24.93900299],
    [-103.93900299, -103.93900299, -103.93900299, ...,  -52.93900299,
      -27.93900299,  -39.93900299],
    [-103.93900299, -103.93900299, -103.93900299, ...,  -45.93900299,
      -29.93900299,  -28.93900299]],

   [[  81.22100067,   70.22100067,   86.22100067, ...,   69.22100067,
       37.22100067,   -0.77899933],
    [  91.22100067,   94.22100067,  108.22100067, ...,   86.22100067,
       50.22100067,    6.22100067],
    [ 107.22100067,   93.22100067,   78.22100067, ...,   73.22100067,
       62.22100067,    6.22100067],
    ..., 
    [-116.77899933, -116.77899933, -116.77899933, ...,  -36.77899933,
      -50.77899933,  -36.77899933],
    [-116.77899933, -116.77899933, -116.77899933, ...,  -64.77899933,
      -39.77899933,  -51.77899933],
    [-116.77899933, -116.77899933, -116.77899933, ...,  -57.77899933,
      -41.77899933,  -40.77899933]],

   [[  78.31999969,   67.31999969,   83.31999969, ...,   61.31999969,
       29.31999969,   -7.68000031],
    [  88.31999969,   91.31999969,  105.31999969, ...,   79.31999969,
       43.31999969,   -0.68000031],
    [ 104.31999969,   90.31999969,   75.31999969, ...,   66.31999969,
       53.31999969,   -2.68000031],
    ..., 
    [-123.68000031, -123.68000031, -123.68000031, ...,  -39.68000031,
      -53.68000031,  -39.68000031],
    [-123.68000031, -123.68000031, -123.68000031, ...,  -67.68000031,
      -42.68000031,  -54.68000031],
    [-123.68000031, -123.68000031, -123.68000031, ...,  -60.68000031,
      -44.68000031,  -43.68000031]]], dtype=float32)

为了确认这一点，我直接对特定图像使用了预处理功能，

import cv2
img = cv2.imread('./images.jpg')
img = img_to_array(img)
x = np.expand_dims(img, axis=0)
x = x.astype(np.float64)
x = preprocess_input(x)

给出以下输出，

array([[[[ 118.061,  125.061,  134.061, ...,   97.061,   99.061,  102.061],
     [ 118.061,  125.061,  133.061, ...,   98.061,  100.061,  102.061],
     [ 113.061,  119.061,  126.061, ...,  100.061,  101.061,  102.061],
     ..., 
     [  65.061,   64.061,   64.061, ...,   60.061,   61.061,   57.061],
     [  64.061,   64.061,   63.061, ...,   66.061,   67.061,   59.061],
     [  56.061,   59.061,   62.061, ...,   61.061,   60.061,   59.061]],

    [[ 113.221,  120.221,  129.221, ...,  112.221,  114.221,  113.221],
     [ 116.221,  123.221,  131.221, ...,  113.221,  115.221,  113.221],
     [ 118.221,  124.221,  131.221, ...,  115.221,  116.221,  113.221],
     ..., 
     [  56.221,   55.221,   55.221, ...,   51.221,   52.221,   51.221],
     [  55.221,   55.221,   54.221, ...,   57.221,   58.221,   53.221],
     [  47.221,   50.221,   53.221, ...,   52.221,   51.221,   50.221]],

    [[ 109.32 ,  116.32 ,  125.32 , ...,  106.32 ,  108.32 ,  108.32 ],
     [ 111.32 ,  118.32 ,  126.32 , ...,  107.32 ,  109.32 ,  108.32 ],
     [ 111.32 ,  117.32 ,  124.32 , ...,  109.32 ,  110.32 ,  108.32 ],
     ..., 
     [  34.32 ,   33.32 ,   33.32 , ...,   30.32 ,   31.32 ,   26.32 ],
     [  33.32 ,   33.32 ,   32.32 , ...,   36.32 ,   37.32 ,   28.32 ],
     [  25.32 ,   28.32 ,   31.32 , ...,   30.32 ,   29.32 ,   28.32 ]]]])

关于为什么会发生这种情况的任何想法？

【问题讨论】：

输出与预处理函数一致。如果您没有预处理，那么您的值将在 0 到 255 之间。
我认为您选择的图像“不走运”。我也没有看到任何大于 135 的东西 :)
我尝试了很多图像，但我仍然面临同样的问题
您可以打印x.max() 和x.min() 以更好地查看结果。负值可能隐藏在... 中，唯一肯定表明没有进行预处理的是大于 152 的值的存在。
所以我得到的最大值为 151.061，最小值为 -123.68

标签： python keras generator resnet image-preprocessing

【解决方案1】：

作为创建ImageDataGenerator时的参数：

train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

【讨论】：

您是否从新的ImageDataGenerator 再次创建了flow_from_directory 生成器？您能分享一下您是如何确认这不会产生预处理输出的吗？
我已经编辑了向您展示我得到的结果的问题
尝试更多图片，我觉得你运气不好。注意shuffle=True.
当我使用 preprocessing_function 参数时，“损失”在训练期间仍然是“南”。如果不进行预处理，损失会显着降低。
经过此预处理，全黑图像将输入为 [-103.939, -116.779, -123.68]。所以，如果你所有的卷积权重都是正的，它可能会从 relu 得到一个零（因此没有梯度，可能是 nan）。但通常权重分布良好（但高学习率可能会很快将所有内容推到零/无梯度）