如何在 Keras 模型中停用使用 training=True 调用的 dropout 层？答案

【问题标题】：How to deactivate a dropout layer called with training=True in a Keras model?如何在 Keras 模型中停用使用 training=True 调用的 dropout 层？
【发布时间】：2019-12-17 15:51:47
【问题描述】：

我希望查看训练 tf.keras 模型的最终输出。在这种情况下，它将是来自 softmax 函数的一系列预测，例如[0,0,0,1,0,1]。

此处的其他线程建议使用 model.predict(training_data)，但这不适用于我的情况，因为我在训练和验证中使用 dropout，因此神经元被随机丢弃并再次使用相同的数据进行预测会给出不同的结果。

def get_model():
    inputs = tf.keras.layers.Input(shape=(input_dims,))
    x = tf.keras.layers.Dropout(rate=dropout_rate)(inputs, training=True)
    x = tf.keras.layers.Dense(units=29, activation='relu')(x)
    x = tf.keras.layers.Dropout(rate=dropout_rate)(x, training=True)  
    x = tf.keras.layers.Dense(units=15, activation='relu')(x)
    outputs = tf.keras.layers.Dense(2, activation='softmax')(x)
    model = tf.keras.Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',      
                  metrics=['sparse_categorical_accuracy'])
    return model

myModel = get_model()
myModel.summary()
myModel.fit(X_train, y_train,
           batch_size = batch_size,
           epochs= epochs,
           verbose = 1,
           validation_data = (X_val, y_val))

在 tensorflow 中，您可以很容易地在训练后获取模型的输出。这是来自Github repo 的示例：

input = tf.placeholder(tf.float32, shape=[None, INPUT_DIMS])
labels = tf.placeholder(tf.float32, shape=[None])

hidden = tf.nn.tanh(make_nn_layer(normalized, NUM_HIDDEN))
logits = make_nn_layer(hidden, NUM_CLASSES)
outputs = tf.argmax(logits, 1)

int_labels = tf.to_int64(labels)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, int_labels, name='xentropy')
train_step = tf.train.AdamOptimizer().minimize(cross_entropy)

correct_prediction = tf.equal(outputs, int_labels)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    validation_dict = {
        input: validation_data[:,0:7],
        labels: validation_data[:,7],}

    for i in range(NUM_BATCHES):
        batch = training_data[numpy.random.choice(training_size, BATCH_SIZE, False),:]
        train_step.run({input: batch[:,0:7], labels: batch[:,7]})

        if i % 100 == 0 or i == NUM_BATCHES - 1:
            print('Accuracy %.2f%% at step %d' % (accuracy.eval(validation_dict) * 100, i))

    output_data = outputs.eval({input: data_vector[:,0:7]})

我可以从经过训练的模型中获得的唯一输出似乎是一个历史对象。还有一个 myModel.output 对象，但它是一个张量，如果不将数据放入其中，我就无法评估。有什么想法吗？

【问题讨论】：

您是否在询问如何获得模型的可视化模型？或者您正在寻找输出数据，类似于x = 0, y = 1？
@Cygnus 是的，我正在寻找输出数据（预测），类似于 model.predict() 的工作方式。

标签： python tensorflow keras tf.keras dropout

【解决方案1】：

据我所知，在调用层时通过training=True 后无法关闭 dropout（除非您将权重转移到具有相同架构的新模型）。但是，您可以在正常情况下构建和训练您的模型（即在调用中不使用training 参数），然后通过定义后端函数（即keras.backend.function()）在测试阶段选择性地打开和关闭dropout 层和设置学习阶段（即keras.backend.learning_phase()）：

# build your model normally (i.e. without using `training=True` argument)

# train your model...

from keras import backend as K

func = K.function(model.inputs + [K.learning_phase()], model.outputs)

# run the model with dropout layers being active, i.e. learning_phase == 1
preds = func(list_of_input_arrays + [1])

# run the model with dropout layers being inactive, i.e. learning_phase == 0
preds = func(list_of_input_arrays + [0])

更新：正如我上面所建议的，另一种方法是定义一个具有相同架构但不设置training=True 的新模型，然后将权重从训练模型转移到这个新模型。为此，我只需在您的 get_model() 函数中添加一个 training 参数：

def get_model(training=None):
    inputs = tf.keras.layers.Input(shape=(input_dims,))
    x = tf.keras.layers.Dropout(rate=dropout_rate)(inputs, training=training)
    x = tf.keras.layers.Dense(units=29, activation='relu')(x)
    x = tf.keras.layers.Dropout(rate=dropout_rate)(x, training=training)  
    x = tf.keras.layers.Dense(units=15, activation='relu')(x)
    outputs = tf.keras.layers.Dense(2, activation='softmax')(x)
    model = tf.keras.Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',      
                  metrics=['sparse_categorical_accuracy'])
    return model

# build a model with dropout layers active in both training and test phases
myModel = get_model(training=True)
# train the model
myModel.fit(...)

# build a clone of the model with dropouts deactivated in test phase
myTestModel = get_model()  # note: the `training` is `None` by default
# transfer the weights from the trained model to this model
myTestModel.set_weights(myModel.get_weights())
# use the new model in test phase; the dropouts would not be active
myTestModel.predict(...)

【讨论】：

太棒了，谢谢。我刚刚使用训练数据测试了您提供的第二种解决方案，我从预测中获得的准确度与训练中报告的准确度相同。
实际上，@today 在我的测试中出现的准确度值恰好等于第 4 位。但是在对数据集进行洗牌时，准确度值不一样，这意味着您使用myTestModel.predict() 获得的输出与myModel 训练期间生成的输出不同。你可以看到使用一些玩具数据：from sklearn.datasets import make_circles # generate 2d classification dataset X, y = make_circles(n_samples=1000, noise=0.05)
@la_leche 很抱歉，我无法理解您的观点。改组影响准确性的数据集是什么意思？以及如何在训练期间获得myModel 的输出？不要忘记进度条中记录的准确度是所有先前批次准确度值的平均值，并且在每个批次之后，模型权重会由于反向传播而发生变化。因此，您无法将训练期间打印在日志栏中的准确度值与您在预测期间获得的准确度进行比较。另外，我不明白您所说的“值巧合相等”是什么意思？等于？
哦，我明白了。是的，我正在将 myTestModel 的准确性与进度条进行比较。 “巧合”我的意思是当我第一次进行测试时，进度条报告 acc = 0.9871 和 sklearn acc = 0.98702 ......它们看起来相同，但只是四舍五入。使用其他数据进行测试会给出 0.9527 和 0.9243 之类的值，这有点不同。感谢您指出进度条点，看来您的解决方案毕竟是我想要的。
感谢您提供此解决方案。我想知道在这种情况下您将如何绘制预测以及置信区间？该模型为均值和标准生成一个 2d numpy 数组 - 我不确定这些意味着什么？