Tensorflow：权重不变，成本设置为 1.0答案

【问题标题】：Tensorflow: weights aren't changing and cost set to 1.0Tensorflow：权重不变，成本设置为 1.0
【发布时间】：2017-12-23 14:58:08
【问题描述】：

我试图构建一个卷积神经网络，但我偶然发现了一些非常奇怪的问题。

第一件事，这是我的代码：

import tensorflow as tf
import numpy as np
import matplotlib.image as mpimg
import glob

x = []
y = 1

for filename in glob.glob('trainig_data/*.jpg'):
    im = mpimg.imread(filename)
    x.append(im)
    if len(x) == 10:
        break
epochs = 5

weights = [tf.Variable(tf.random_normal([5,5,3,32],0.1)),
           tf.Variable(tf.random_normal([5,5,32,64],0.1)),
           tf.Variable(tf.random_normal([5,5,64,128],0.1)),
           tf.Variable(tf.random_normal([75*75*128,1064],0.1)),
           tf.Variable(tf.random_normal([1064,1],0.1))]

def CNN(x, weights):
    output = tf.nn.conv2d([x], weights[0], [1,1,1,1], 'SAME')
    output = tf.nn.relu(output)
    output = tf.nn.conv2d(output, weights[1], [1,2,2,1], 'SAME')
    output = tf.nn.relu(output)
    output = tf.nn.conv2d(output, weights[2], [1,2,2,1], 'SAME')
    output = tf.nn.relu(output)
    output = tf.reshape(output, [-1,75*75*128])
    output = tf.matmul(output, weights[3])
    output = tf.nn.relu(output)
    output = tf.matmul(output, weights[4])
    output = tf.reduce_sum(output)
    return output


sess = tf.Session()
prediction = CNN(tf.cast(x[0],tf.float32), weights)
cost = tf.reduce_mean(tf.square(prediction-y))
train = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
init = tf.global_variables_initializer()

sess.run(init)
for e in range(epochs):
    print('epoch:',e+1)
    for x_i in x:
        prediction = CNN(tf.cast(x_i,tf.float32), weights)
        sess.run([cost, train])
        print(sess.run(cost))
print('optimization finished!')
print(sess.run(prediction))

现在这是我的问题：

权重和过滤器的值没有改变
变量“成本”始终为 1.0
预测总是输出 0

在做了一些调试后发现问题一定出在优化器上，因为在我将权重放入优化器之前，成本和预测都不是 1.0 和 0。

我希望这是足够的信息，你可以帮助我解决我的问题。

【问题讨论】：

标签： python tensorflow neural-network conv-neural-network gradient-descent

【解决方案1】：

尝试改变初始化权重的方式，使用tf.truncated_normal 来初始化权重。请参阅answer，它说明了 tf.truncated_normal 之间的差异。

tf.truncted_normal：从截断的正态分布中输出随机值。生成的值遵循具有指定均值和标准差的正态分布，但幅度大于均值 2 个标准差的值会被丢弃并重新选择。

tf.random_normal：输出正态分布的随机值。

【讨论】：

没有任何改变

【解决方案2】：

代码似乎很奇怪。在 CNN 函数的最后一行中，您使用 tf.reduce_mean 获取单个值作为输出，这将是一个正数（很可能大于 1） (0, inf) 作为 relu 激活函数输出仅用于位于正 x 轴上的输入的正值。所以，我认为你应该使用 tf.nn.softmax_with_logits() 而不是 tf.reduce_mean。也尝试使用 sigmoid 激活函数。

【讨论】：