为什么直接比较 tensorflow 的准确率比 keras 差？答案

【问题标题】：Why is tensorflow having a worse accuracy than keras in direct comparison?为什么直接比较 tensorflow 的准确率比 keras 差？
【发布时间】：2020-04-01 04:40:15
【问题描述】：

我用相同的参数和相同的数据集 (MNIST) 对 TensorFlow 和 Keras 进行了直接比较。

奇怪的是，Keras 在 10 个 epoch 中达到了 96% 的性能，而 TensorFlow 在 10 个 epoch 中达到了大约 70% 的性能。我在同一个实例中多次运行此代码，并且总是发生这种不一致。

即使为 TensorFlow 设置 50 个 epoch，最终性能也达到 90%。

代码：

import keras
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

# One hot encoding
from keras.utils import np_utils
y_train = np_utils.to_categorical(y_train) 
y_test = np_utils.to_categorical(y_test) 

# Changing the shape of input images and normalizing
x_train = x_train.reshape((60000, 784))
x_test = x_test.reshape((10000, 784))
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation

# Creating the neural network
model = Sequential()
model.add(Dense(30, input_dim=784, kernel_initializer='normal', activation='relu'))
model.add(Dense(30, kernel_initializer='normal', activation='relu'))
model.add(Dense(10, kernel_initializer='normal', activation='softmax'))

# Optimizer
optimizer = keras.optimizers.Adam()

# Loss function
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])

# Training
model.fit(x_train, y_train, epochs=10, batch_size=200, validation_data=(x_test, y_test), verbose=1)

# Checking the final accuracy
accuracy_final = model.evaluate(x_test, y_test, verbose=0)
print('Model Accuracy: ', accuracy_final)

TensorFlow 代码：（x_train、x_test、y_train、y_test 与上面 Keras 代码的输入相同）

import tensorflow as tf
# Epochs parameters
epochs = 10
batch_size = 200

# Neural network parameters
n_input = 784 
n_hidden_1 = 30 
n_hidden_2 = 30 
n_classes = 10 

# Placeholders x, y
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])

# Creating the first layer
w1 = tf.Variable(tf.random_normal([n_input, n_hidden_1]))
b1 = tf.Variable(tf.random_normal([n_hidden_1]))
layer_1 = tf.nn.relu(tf.add(tf.matmul(x,w1),b1)) 

# Creating the second layer 
w2 = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]))
b2 = tf.Variable(tf.random_normal([n_hidden_2]))
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1,w2),b2)) 

# Creating the output layer 
w_out = tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
bias_out = tf.Variable(tf.random_normal([n_classes]))
output = tf.matmul(layer_2, w_out) + bias_out

# Loss function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = output, labels = y))
# Optimizer
optimizer = tf.train.AdamOptimizer().minimize(cost)

# Making predictions
predictions = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))

# Accuracy
accuracy = tf.reduce_mean(tf.cast(predictions, tf.float32))

# Variables that will be used in the training cycle
train_size = x_train.shape[0]
total_batches = train_size / batch_size

# Initializing the variables
init = tf.global_variables_initializer()

# Opening the session
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(epochs):

        # Loop through all batch iterations
        for i in range(0, train_size, batch_size): 
            batch_x = x_train[i:i + batch_size]
            batch_y = y_train[i:i + batch_size]

            # Fit training
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})

        # Running accuracy (with test data) on each epoch    
        acc_val = sess.run(accuracy, feed_dict={x: x_test, y: y_test})
        # Showing results after each epoch
        print ("Epoch: ", "{}".format((epoch + 1)))
        print ("Accuracy_val = ", "{:.3f}".format(acc_val))

    print ("Training Completed!")

    # Checking the final accuracy
    checking = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
    accuracy_final = tf.reduce_mean(tf.cast(checking, tf.float32))  
    print ("Model Accuracy:", accuracy_final.eval({x: x_test, y: y_test}))

我在同一个实例中运行所有内容。谁能解释这种不一致？

【问题讨论】：

标签： tensorflow machine-learning keras deep-learning image-recognition

【解决方案1】：

我认为这是罪魁祸首的初始化。例如，一个真正的区别是您在 TF 中使用 random_normal 初始化偏差，这不是最佳实践，实际上 Keras 默认将偏差初始化为零，这是最佳实践。你不要覆盖它，因为你只在 Keras 代码中设置了kernel_initializer，而不是bias_initializer。

此外，权重初始化器的情况更糟。您正在为 Keras 使用 RandomNormal，定义如下：

keras.initializers.RandomNormal(mean=0.0, stddev=0.05, seed=None)

但在 TF 中你使用 tf.random.normal:

tf.random.normal(shape, mean=0.0, stddev=1.0, dtype=tf.dtypes.float32,    seed=None, name=None)

我可以告诉你，使用 0.05 的标准差进行初始化是合理的，但使用 1.0 则不是。

我怀疑如果您更改这些参数，情况会更好。但如果他们不这样做，我建议为这两个模型转储 TensorFlow 图，然后手动检查以查看差异。在这种情况下，图表足够小，可以仔细检查。

这在一定程度上凸显了 Keras 和 TF 在哲学上的差异。 Keras 努力为 NN 训练设置与已知有效的默认值相对应的良好默认值。但是 TensorFlow 是完全不可知论的——你必须知道这些实践并明确地编码它们。标准差就是一个很好的例子：当然它在数学函数中默认应该是 1，但是如果你知道它，0.05 是一个很好的值将用于初始化 NN 层。

Answer originally provided by Dmitriy Genzel on Quora.

【讨论】：