Tensorflow - 模型预测中的预处理图像答案

【问题标题】：Tensorflow - Preprocessing image in model predictionTensorflow - 模型预测中的预处理图像
【发布时间】：2021-08-03 12:33:32
【问题描述】：

我使用功能 API 和两种不同类型的预训练模型训练了一个模型：EfficientNet B5 和 MobileNet V2。在使用保存的模型进行训练后，我正在运行一个应用程序，该应用程序使用该模型进行一些预测。

我提出了一个疑问，即将图像传递给“model.prediction()”参数的正确方法是什么。

型号：

    self.feature_extractor1 = EfficientNetB5(#weights='imagenet',
                                  input_shape=self.input_shape,
                                  include_top=False)

    self.feature_extractor2 = MobileNetV2(#weights='imagenet',
                                  input_shape=self.input_shape,
                                  include_top=False)


    for layer in self.feature_extractor1.layers:
        layer.trainable = False    

    for layer in self.feature_extractor2.layers:
        layer.trainable = False        
    

    input_ = Input(shape=self.input_shape)
    processed_input1 = b5_preprocess_input(input_)

    processed_input2 = mbv2_preprocess_input(input_)

    x1 = self.feature_extractor1(processed_input1)
    x1 = GlobalAveragePooling2D()(x1)
    x1 = Dropout(0.2)(x1)
    x1 = Flatten()(x1)

    x2 = self.feature_extractor2(processed_input2)
    x2 = GlobalAveragePooling2D()(x2)
    x2 = Dropout(0.2)(x2)
    x2 = Flatten()(x2)

    x = Concatenate()([x1, x2])

    x = Dense(512, activation='relu')(x) #,kernel_initializer=initializer,kernel_regularizer=regularizers.l2(0.001)) 
    x = Dense(1024, activation='relu')(x)

    output_shape = Dense(shape_categories, activation='softmax', name='shape')(x)

    model = Model(inputs=input_,
                  outputs=output_shape)
                  
    adam_kwargs = {'beta_1': 0.9, 'beta_2': 0.9, 'epsilon': 1e-7}
    sgd_kwargs = {'decay': 1e-6, 'momentum': 0.9, 'nesterov': True}
    optimizer = self.optimizers(kwargs=adam_kwargs)
    
    model.compile(loss='categorical_crossentropy',
                  optimizer=optimizer,
                  metrics=['accuracy'])

    model.summary()

    STEP_SIZE_TRAIN = self.phase_gen[0].n// self.phase_gen[0].batch_size
    STEP_SIZE_VALID = self.phase_gen[1].n// self.phase_gen[1].batch_size
    if self.phases == 3:
        STEP_SIZE_TEST = self.phase_gen[2].n// self.phase_gen[2].batch_size

    checkpoint = ModelCheckpoint(self.model_dir,
                                monitor='val_accuracy',
                                verbose=1,
                                save_best_only=True,
                                mode='max')
    tensorboard = TensorBoard(log_dir=self.model_dir + '/logs',
                            histogram_freq=5,
                            embeddings_freq=5)
                            #[EarlyStopping(monitor='val_loss', patience=8)]
    callbacks = [checkpoint, tensorboard]

    
    hist = model.fit_generator(generator=self.phase_gen[0],
                               steps_per_epoch=STEP_SIZE_TRAIN,
                               validation_data=self.phase_gen[1],
                               validation_steps=STEP_SIZE_VALID,
                               epochs=self.epochs,
                               callbacks=callbacks
                               )

在另一个脚本中，我有预测方法：

from tensorflow.keras.applications.mobilenet_v2 import preprocess_input as mbv2_preprocess_input
from tensorflow.keras.applications.efficientnet import preprocess_input as b5_preprocess_input

def preprocess_image(img):
    img = Image.open(io.BytesIO(img))
    img = img.resize((224, 224), Image.ANTIALIAS)
    img = image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    #return [b5_preprocess_input(img),  mbv2_preprocess_input(img)]
    return [img, img]

modelSHP = get_modelSHP()

@app.route('/part_numbers', methods=['POST'])
def part_number():
    img = request.files.get('image').read()
    processed_image = preprocess_image(img)
    predict_shape = modelSHP.predict(processed_image)

我的第一个想法是我需要传递由正确函数预处理的输入（图像），并且按照我在模型训练期间使用它的顺序。但是当我完成它时，我的预测准确性保持在零左右。只传图片，不做任何预处理，效果更好。

我将图像输入传递给 model.prediction 的方式是否正确（无需预处理）？我想知道是否使用 Functional API 以及在我构建模型的方式中，预处理变成了每个分支模型中的一个层。

【问题讨论】：

标签： python tensorflow computer-vision conv-neural-network image-preprocessing

【解决方案1】：

我复制了你的代码，然后打印出模型摘要，如下所示

Model: "functional_5"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_23 (InputLayer)           [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
tf.math.truediv_5 (TFOpLambda)  (None, 224, 224, 3)  0           input_23[0][0]                   
__________________________________________________________________________________________________
tf.math.subtract_5 (TFOpLambda) (None, 224, 224, 3)  0           tf.math.truediv_5[0][0]          
__________________________________________________________________________________________________
efficientnetb5 (Functional)     (None, 7, 7, 2048)   28513527    input_23[0][0]                   
__________________________________________________________________________________________________
mobilenetv2_1.00_224 (Functiona (None, 7, 7, 1280)   2257984     tf.math.subtract_5[0][0]         
__________________________________________________________________________________________________
global_average_pooling2d_8 (Glo (None, 2048)         0           efficientnetb5[0][0]             
__________________________________________________________________________________________________
global_average_pooling2d_9 (Glo (None, 1280)         0           mobilenetv2_1.00_224[0][0]       
__________________________________________________________________________________________________
dropout_8 (Dropout)             (None, 2048)         0           global_average_pooling2d_8[0][0] 
__________________________________________________________________________________________________
dropout_9 (Dropout)             (None, 1280)         0           global_average_pooling2d_9[0][0] 
__________________________________________________________________________________________________
flatten_8 (Flatten)             (None, 2048)         0           dropout_8[0][0]                  
__________________________________________________________________________________________________
flatten_9 (Flatten)             (None, 1280)         0           dropout_9[0][0]                  
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, 3328)         0           flatten_8[0][0]                  
                                                                 flatten_9[0][0]                  
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 512)          1704448     concatenate_3[0][0]              
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 1024)         525312      dense_6[0][0]                    
__________________________________________________________________________________________________
shape (Dense)                   (None, 2)            2050        dense_7[0][0]                    
==================================================================================================
Total params: 33,003,321
Trainable params: 2,231,810
Non-trainable params: 30,771,511

正如您所假设的，预处理成为模型中的层。因此，对于预测，您不必对模型中内置的输入进行预处理。对于efficientNet，预处理函数只是一个传递，因为efficientnet 期望输入像素在0 到255 范围内。因此在模型摘要中，您可以看到输入（input_23）直接输入到efficientnet。对于 MobileNet，预处理函数在 -1 和 +1 之间缩放像素。这是通过方程输入像素=像素/127.5 - 1 完成的。因此，tf.math.truediv_5 层将 input_23 除以 127.5，然后 tf.math.subtract_5 层减去 1。

【讨论】：

谢谢！我没有检查摘要（遗憾的是，我没有想到这个想法）