TensorFlow - MNIST 数据中的训练精度没有提高答案

【问题标题】：TensorFlow - Training accuracy not improving in MNIST dataTensorFlow - MNIST 数据中的训练精度没有提高
【发布时间】：2017-04-26 13:56:58
【问题描述】：

我用tensorflow写了一个程序来处理Kaggle的digit-recognizer问题，程序可以正常运行，但是训练的准确率总是很低，大概在10%左右，如下：

step 0, training accuracy 0.11
step 100, training accuracy 0.13
step 200, training accuracy 0.21
step 300, training accuracy 0.12
step 400, training accuracy 0.07
step 500, training accuracy 0.08
step 600, training accuracy 0.15
step 700, training accuracy 0.05
step 800, training accuracy 0.08
step 900, training accuracy 0.12
step 1000, training accuracy 0.05
step 1100, training accuracy 0.09
step 1200, training accuracy 0.12
step 1300, training accuracy 0.1
step 1400, training accuracy 0.08
step 1500, training accuracy 0.11
step 1600, training accuracy 0.17
step 1700, training accuracy 0.13
step 1800, training accuracy 0.11
step 1900, training accuracy 0.13
step 2000, training accuracy 0.07
……

以下是我的代码：

def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, w):
    return tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    # ksize = [batch, heigh, width, channels], strides=[batch, stride, stride, channels]
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

x = tf.placeholder(tf.float32, [None, 784])      
y_ = tf.placeholder(tf.float32, [None, 10])      
keep_prob = tf.placeholder(tf.float32)

x_image = tf.placeholder(tf.float32, [None, 28, 28, 1])

w_conv1 = weight_variable([5, 5, 1, 32])    
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

w_conv2 = weight_variable([5, 5, 32, 64])    
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

w_fc1 = weight_variable([7 * 7 * 64, 1024])   
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])     
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)
# dropout
keep_prob = tf.placeholder(tf.float32)      
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# softmax
w_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, w_fc2) + b_fc2)

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(10e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

def get_batch(i, size, train, label):
    startIndex = (i * size) % 42000
    endIndex = startIndex + size
    batch_X = train[startIndex : endIndex]
    batch_Y = label[startIndex : endIndex]
    return batch_X, batch_Y


data = pd.read_csv('train.csv')
train_data = data.drop(['label'], axis=1)
train_data = train_data.values.astype(dtype=np.float32)
train_data = train_data.reshape(42000, 28, 28, 1)

label_data = data['label'].tolist()
label_data = tf.one_hot(label_data, depth=10)
label_data = tf.Session().run(label_data).astype(dtype=np.float64)


batch_size = 100                             
tf.global_variables_initializer().run()

for i in range(20000):   
    batch_x, batch_y = get_batch(i, batch_size, train_data, label_data)
    if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x_image: batch_x, y_: batch_y, keep_prob: 1.0})
        print("step %d, training accuracy %g" % (i, train_accuracy))
    train_step.run(feed_dict={x_image: batch_x, y_: batch_y, keep_prob: 0.9})

我不知道我的程序出了什么问题。

【问题讨论】：

如果答案解决了您的问题，请接受 - 请参阅 What should I do when someone answers my question? - 谢谢

标签： tensorflow machine-learning mnist kaggle

【解决方案1】：

我建议您更改 bias_variable 函数 - 不确定 tf.Variable(tf.constant) 的行为方式，另外我们通常将偏差初始化为零，而不是 0.1：

def bias_variable(shape):
    return tf.zeros((shape), dtype = tf.float32)

如果这没有帮助，请尝试使用 stddev=0.01 初始化您的权重

【讨论】：

非常感谢您的回答！我把weights和bias的stddev改成0.01，程序在3000步之前正常运行，准确率在0.99左右。但是在3000步之后，准确率突然变成0.1左右。很奇怪。
@Lixudong 这是另一个不同的问题 - 请接受这个答案并打开一个新帖子