二元分类器总是返回 0.5答案

【问题标题】：Binary classifier always returns 0.5二元分类器总是返回 0.5
【发布时间】：2018-10-05 14:41:23
【问题描述】：

我正在训练一个分类器，它接受一个 RGB 输入（所以三个 0 到 255 值）并返回黑色或白色（0 或 1）字体是否最适合该颜色。训练后，我的分类器总是返回 0.5（或大约），并且永远不会比这更准确。

代码如下：

import tensorflow as tf
import numpy as np
from tqdm import tqdm

print('Creating Datasets:')

x_train = []
y_train = []

for i in tqdm(range(10000)):
    x_train.append([np.random.uniform(0, 255), np.random.uniform(0, 255), np.random.uniform(0, 255)])

for elem in tqdm(x_train):
    if (((elem[0] + elem[1] + elem[2]) / 3) / 255) > 0.5:
        y_train.append(0)
    else:
        y_train.append(1)

x_train = np.array(x_train)
y_train = np.array(y_train)

graph = tf.Graph()

with graph.as_default():

    x = tf.placeholder(tf.float32)
    y = tf.placeholder(tf.float32)

    w_1 = tf.Variable(tf.random_normal([3, 10], stddev=1.0), tf.float32)
    b_1 = tf.Variable(tf.random_normal([10]), tf.float32)
    l_1 = tf.sigmoid(tf.matmul(x, w_1) + b_1)

    w_2 = tf.Variable(tf.random_normal([10, 10], stddev=1.0), tf.float32)
    b_2 = tf.Variable(tf.random_normal([10]), tf.float32)
    l_2 = tf.sigmoid(tf.matmul(l_1, w_2) + b_2)

    w_3 = tf.Variable(tf.random_normal([10, 5], stddev=1.0), tf.float32)
    b_3 = tf.Variable(tf.random_normal([5]), tf.float32)
    l_3 = tf.sigmoid(tf.matmul(l_2, w_3) + b_3)

    w_4 = tf.Variable(tf.random_normal([5, 1], stddev=1.0), tf.float32)
    b_4 = tf.Variable(tf.random_normal([1]), tf.float32)
    y_ = tf.sigmoid(tf.matmul(l_3, w_4) + b_4)

    loss = tf.reduce_mean(tf.squared_difference(y, y_))

    optimizer = tf.train.AdadeltaOptimizer().minimize(loss)

    with tf.Session() as sess:

        sess.run(tf.global_variables_initializer())

        print('Training:')

        for step in tqdm(range(5000)):
            index = np.random.randint(0, len(x_train) - 129)
            feed_dict = {x : x_train[index:index+128], y : y_train[index:index+128]}
            sess.run(optimizer, feed_dict=feed_dict)
            if step % 1000 == 0:
                print(sess.run([loss], feed_dict=feed_dict))

        while True:
            inp1 = int(input(''))
            inp2 = int(input(''))
            inp3 = int(input(''))
            print(sess.run(y_, feed_dict={x : [[inp1, inp2, inp3]]}))

如您所见，我首先导入将要使用的模块。接下来我生成我的输入 x 数据集和所需的输出 y 数据集。 x_train 数据集由 10000 个随机 RGB 值组成，而 y_train 数据集由 0 和 1 组成，其中 1 对应于均值低于 128 的 RGB 值，0 对应于均值高于 128 的 RGB 值（这确保明亮的背景得到深色字体，反之亦然）。

无可否认，我的神经网络过于复杂（或者我假设如此），但据我所知，它是一个非常标准的前馈网络，具有 Adadelta 优化器和默认学习率。

就我有限的知识而言，网络的训练是正常的，但模型总是吐出 0.5。

最后一段代码允许用户输入值并查看它们在传递到神经网络时会变成什么。

我弄乱了不同的激活函数、损失、初始化偏差的方法等。但无济于事。有时当我修改代码时，模型总是分别返回 1 或 0，但这仍然与优柔寡断并一遍又一遍地返回 0.5 一样不准确。我无法在网上找到合适的解决方案来解决我的问题。欢迎任何意见或建议。

编辑：

损失、权重、偏差和输出在训练过程中变化不大（权重和偏差每 1000 次迭代仅变化百分之一和千分之一，损失在 0.3 左右波动）。此外，输出有时会根据输入变化 f（如您所料），但其他时间是恒定的。该程序的一次运行导致常量 0.7 作为输出，而另一次总是返回 0.5，除了非常接近于零，它返回 0.3 或 0.4 类型值。上述都不是所需的输出。应该发生的是 (255, 255, 255) 应该映射到 0 并且 (0, 0, 0) 应该映射到 1 并且 (128, 128, 128) 应该映射到 1 或 0，如在字体中间颜色并不重要。

【问题讨论】：

你的体重有变化吗？你的损失表现如何？你的梯度发散了吗？提供更多信息，以便我们为您提供帮助
损失收敛了吗？打印训练数据的分类时会发生什么？
查看我的回答 here，它提供了一个 debug_minimize 函数来查看您的权重是否发生了变化

标签： python tensorflow machine-learning

【解决方案1】：

通过查看您的网络，我看到了两件事：

隐藏层中的 Sigmoid 激活通常是一个糟糕的选择。 sigmoid 函数对于大（正或负）输入会饱和，导致梯度在通过网络反向传播时变得越来越小。这通常被称为“梯度消失”问题。可能是输出附近变量的梯度是“健康的”，因此上层正在学习，但是如果下层没有收到任何梯度，它们将简单地继续返回上层无法使用的随机值.您可以尝试用例如替换 sigmoid 激活。 tf.nn.relu。输出层中的 Sigmoid 是可以的（如果您希望输出为 0/1，则有点必要），但是请考虑使用交叉熵而不是平方误差作为损失函数。
您的权重初始化可能会导致权重过大。 1.0的标准差太高了。这可能会导致数值问题以及使激活更加饱和（因为由于权重较大，您可以期望从一开始就具有较大的激活值）。尝试类似 0.1 的标准，并考虑使用 truncated_normal 来防止异常值（或使用统一的随机初始化）。

很难说这是否会解决您的问题，但我相信这两个方面您绝对应该改变您的网络，就像现在一样。

【讨论】：

我已经实施了建议的更改，但网络仍然无法学习。无论如何，谢谢，因为我不再对为什么这么多人似乎避免使用 sigmoid 感到困惑。

【解决方案2】：

最大的问题是您在分类问题上使用均方误差作为损失函数。交叉熵损失函数更适合这类问题。

下面是交叉熵损失函数和均方误差损失函数区别的可视化：

来源：Wolfram Alpha

请注意，随着模型远离正确预测（在本例中为 1），损失如何逐渐增加。该曲率在反向传播期间提供了更强的梯度信号，同时还满足了许多重要的理论概率分布距离（散度）特性。通过最小化交叉熵损失，您实际上也在最小化模型的预测分布和训练数据标签分布之间的 KL 散度。您可以在此处阅读有关交叉熵损失函数的更多信息：http://colah.github.io/posts/2015-09-Visual-Information/

我还调整了其他一些东西，以使代码更好，并使模型更容易修改。这应该可以解决您的所有问题：

import tensorflow as tf
import numpy as np
from tqdm import tqdm

# define a random seed for (somewhat) reproducible results:
seed = 0
np.random.seed(seed)
print('Creating Datasets:')

# much faster dataset creation
x_train = np.random.uniform(low=0, high=255, size=[10000, 3])
# easier label creation
# if the average color is greater than half the color space than use black, otherwise use white
# classes:
# white = 0
# black = 1
y_train = ((np.mean(x_train, axis=1) / 255.0) > 0.5).astype(int)

# now transform dataset to be within range [-1, 1] instead of [0, 255] 
# for numeric stability and quicker model training
x_train = (2 * (x_train / 255)) - 1

graph = tf.Graph()

with graph.as_default():
    # must do this within graph scope
    tf.set_random_seed(seed)
    # specify input dims for clarity
    x = tf.placeholder(tf.float32, shape=[None, 3])
    # y is now integer label [0 or 1]
    y = tf.placeholder(tf.int32, shape=[None])
    # use relu, usually better than sigmoid 
    activation_fn = tf.nn.relu
    # from https://arxiv.org/abs/1502.01852v1
    initializer = tf.initializers.variance_scaling(
        scale=2.0, 
        mode='fan_in',
        distribution='truncated_normal')
    # better api to reduce clutter
    l_1 = tf.layers.dense(
        x,
        10,
        activation=activation_fn,
        kernel_initializer=initializer)
    l_2 = tf.layers.dense(
        l_1,
        10,
        activation=activation_fn,
        kernel_initializer=initializer)
    l_3 = tf.layers.dense(
        l_2,
        5,
        activation=activation_fn,
        kernel_initializer=initializer)
    y_logits = tf.layers.dense(
        l_3,
        2,
        activation=None,
        kernel_initializer=initializer)

    y_ = tf.nn.softmax(y_logits)
    # much better loss function for classification
    loss = tf.reduce_mean(
        tf.losses.sparse_softmax_cross_entropy(
            labels=y, 
            logits=y_logits))
    # much better default optimizer for new problems
    # good learning rate, but probably can tune
    optimizer = tf.train.AdamOptimizer(
        learning_rate=0.01)
    # seperate train op for easier calling
    train_op = optimizer.minimize(loss)

    # tell tensorflow not to allocate all gpu memory at start
    config = tf.ConfigProto()
    config.gpu_options.allow_growth=True
    with tf.Session(config=config) as sess:

        sess.run(tf.global_variables_initializer())

        print('Training:')

        for step in tqdm(range(5000)):
            index = np.random.randint(0, len(x_train) - 129)
            feed_dict = {x : x_train[index:index+128], 
                         y : y_train[index:index+128]}
            # can train and get loss in single run, much more efficient
            _, b_loss = sess.run([train_op, loss], feed_dict=feed_dict)
            if step % 1000 == 0:
                print(b_loss)

        while True:
            inp1 = int(input('Enter R pixel color: '))
            inp2 = int(input('Enter G pixel color: '))
            inp3 = int(input('Enter B pixel color: '))
            # scale to model train range [-1, 1]
            model_input = (2 * (np.array([inp1, inp2, inp3], dtype=float) / 255.0)) - 1
            if (model_input >= -1).all() and (model_input <= 1).all():
                # y_ is now two probabilities (white_prob, black_prob) but they will sum to 1.
                white_prob, black_prob = sess.run(y_, feed_dict={x : [model_input]})[0]
                print('White prob: {:.2f} Black prob: {:.2f}'.format(white_prob, black_prob))
            else:
                print('Values not within [0, 255]!')

我用 cmets 记录了我的更改，但如果您有任何问题，请告诉我！我最终运行了它，它运行良好：

Creating Datasets:
2018-10-05 00:50:59.156822: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-10-05 00:50:59.411003: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:03:00.0
totalMemory: 8.00GiB freeMemory: 6.60GiB
2018-10-05 00:50:59.417736: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1484] Adding visible gpu devices: 0
2018-10-05 00:51:00.109351: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-05 00:51:00.113660: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:971]      0
2018-10-05 00:51:00.118545: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:984] 0:   N
2018-10-05 00:51:00.121605: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6370 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1)
Training:
  0%|                                                                                         | 0/5000 [00:00<?, ?it/s]0.6222609
 19%|██████████████▋                                                               | 940/5000 [00:01<00:14, 275.57it/s]0.013466636
 39%|██████████████████████████████                                               | 1951/5000 [00:02<00:04, 708.07it/s]0.0067519126
 59%|█████████████████████████████████████████████▊                               | 2971/5000 [00:04<00:02, 733.24it/s]0.0028143923
 79%|████████████████████████████████████████████████████████████▌                | 3935/5000 [00:05<00:01, 726.36it/s]0.0073514087
100%|█████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:07<00:00, 698.32it/s]
Enter R pixel color: 1
Enter G pixel color: 1
Enter B pixel color: 1
White prob: 1.00 Black prob: 0.00
Enter R pixel color: 255
Enter G pixel color: 255
Enter B pixel color: 255
White prob: 0.00 Black prob: 1.00
Enter R pixel color: 128
Enter G pixel color: 128
Enter B pixel color: 128
White prob: 0.08 Black prob: 0.92
Enter R pixel color: 126
Enter G pixel color: 126
Enter B pixel color: 126
White prob: 0.99 Black prob: 0.01

【讨论】：