使用 Keras 的二元分类总是给出错误的预测：acc 总是 0.5答案

【问题标题】：Binary classification using Keras always give wrong predictions: The acc is always 0.5使用 Keras 的二元分类总是给出错误的预测：acc 总是 0.5
【发布时间】：2020-05-07 01:40:44
【问题描述】：

嗨~我正在使用 Keras 做一个简单的二进制分类。我使用 TF 作为后端。

我检查过：

数据 shuffle：我在 model.fit() shuffle = True 中设置了参数
网络结构：NN 采用具有 1024 个元素的向量并进行 0 或 1 预测。

ENV：张量流 1.13.2 Ubuntu 16.04 python3

但输出仍然是错误的。 acc 始终为 0.5。

import tensorflow as tf
from tensorflow.keras.layers import Input, Flatten, Dense, Lambda, Conv2D, Reshape, MaxPool2D, Average, Dropout, Concatenate, \
    Add, Maximum, Layer, Activation, Conv1D, TimeDistributed, GlobalAvgPool2D
import numpy as np


class Test(tf.keras.Model):
    def __init__(self,attention_sz,dropout_rt, name=None):
        super(Test, self).__init__(name=name)
        # here we define the layer:
        self.fc = Dense(attention_sz,input_dim = attention_sz ,activation='relu')
        self.fc2 = Dense(attention_sz, activation='relu')
        self.fc3 = Dense(1, activation='sigmoid')

        self.dp = Dropout(dropout_rt,input_shape=(attention_sz,))
        self.dp2 = Dropout(dropout_rt,input_shape=(attention_sz,))


    def call(self, inp):
        # here we get the segmentation and pose
        with tf.device('/gpu:0'):
            print("~~~~~~~~~~~")
            x = self.fc(inp)
            print(x.shape)
            z = self.dp(x)
            print(z.shape)
            x = self.fc2(z)
            print(x.shape)
            z = self.dp2(x)
            print(z.shape)
            y = self.fc3(z)
            print(y.shape)
        return y 

if __name__ == '__main__':
    model  = Test(1024, 0.05)
    model.compile(optimizer='rmsprop',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    x = np.round(np.random.normal(1.75, 0.2, size=(10000, 1024)), 2)
    x2 = np.round(np.random.normal(100.75, 0.2, size=(10000, 1024)), 2)
    labels = np.zeros((10000, 1))
    labels2 = np.ones((10000, 1))

    x_t = np.row_stack((x, x2))
    labels = np.row_stack((labels,labels2))
    print(x_t.shape)
    print(labels.shape)
    model.fit(x_t, labels, shuffle=True, epochs=10, batch_size=32)
    x = np.round(np.random.normal(1.75, 0.2, size=(1, 1024)), 2)
    y = np.round(np.random.normal(100.75, 0.2, size=(1, 1024)), 2)
    res = model.predict(x)
    print(res)
    print(res.shape)
    res = model.predict(y)
    print(res)
    print(res.shape)

输出：

WARNING:tensorflow:From /home/frank/Desktop/mesh-py3/my_venv/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2020-05-06 19:00:58.440615: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-05-06 19:00:58.616327: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-06 19:00:58.617158: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55201b0 executing computations on platform CUDA. Devices:
2020-05-06 19:00:58.617175: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
2020-05-06 19:00:58.636996: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2592000000 Hz
2020-05-06 19:00:58.637508: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x558add0 executing computations on platform Host. Devices:
2020-05-06 19:00:58.637523: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2020-05-06 19:00:58.637876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.095
pciBusID: 0000:01:00.0
totalMemory: 7.77GiB freeMemory: 7.06GiB
2020-05-06 19:00:58.637892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-05-06 19:00:58.639694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-06 19:00:58.639708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2020-05-06 19:00:58.639713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2020-05-06 19:00:58.639923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6868 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
Epoch 1/10
2020-05-06 19:00:59.495123: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
20000/20000 [==============================] - 3s 148us/sample - loss: 8.0497 - acc: 0.4997
Epoch 2/10
20000/20000 [==============================] - 2s 98us/sample - loss: 8.0590 - acc: 0.5000
Epoch 3/10
20000/20000 [==============================] - 2s 99us/sample - loss: 8.0590 - acc: 0.5000
Epoch 4/10
20000/20000 [==============================] - 2s 80us/sample - loss: 8.0590 - acc: 0.5000
Epoch 5/10
20000/20000 [==============================] - 2s 81us/sample - loss: 8.0590 - acc: 0.5000
Epoch 6/10
20000/20000 [==============================] - 2s 80us/sample - loss: 8.0590 - acc: 0.5000
Epoch 7/10
20000/20000 [==============================] - 2s 89us/sample - loss: 8.0590 - acc: 0.5000
Epoch 8/10
20000/20000 [==============================] - 2s 83us/sample - loss: 8.0590 - acc: 0.5000
Epoch 9/10
20000/20000 [==============================] - 2s 78us/sample - loss: 8.0590 - acc: 0.5000
Epoch 10/10
20000/20000 [==============================] - 2s 79us/sample - loss: 8.0590 - acc: 0.5000
[[0.]]
(1, 1)
[[0.]]
(1, 1)

Process finished with exit code 0

提前致谢！

【问题讨论】：

相同的代码在 1 个 epoch 后给我 0.9857
哇！为什么？这很奇怪。
snipboard.io/qOABle.jpg
我正在使用tensorflow-cpu==1.15.0和ubuntu 18.04
张量流 1.13.2 Ubuntu 16.04 Keras 2.2.4

标签： python tensorflow keras deep-learning

【解决方案1】：

问题的根本原因与使用 tensorflow-cpu 版本时模型最后一层中 sigmoid 激活的数值不稳定性有关。我在您的代码中更改了两行，并得到了与 TF1 类似的结果。 15.请查看gist here。

self.fc3 = Dense(1) #, activation='sigmoid'

loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
model.compile(optimizer='rmsprop',
                  loss=loss, #'binary_crossentropy'
                  metrics=['accuracy'])

当我将您的代码与 TensorFlow-gpu 版本的 TF1.13.2 一起使用时，我注意到与您在 TF1.15 中看到的类似结果。请注意，cpu 和 gpu 版本使用不同的库来优化计算时间。 Here 是 TF1.13.2-gpu 版本的要点。希望清楚。

【讨论】：

【解决方案2】：

这意味着您的网络没有在学习。在这种情况下，您可以尝试以下方法：

改变学习率。让它变小，直到你达到 1e-6。如果仍然没有学习问题在其他地方。
尝试不同的优化器。我有很多经验，对于相同的网络但不同的数据，我必须更改优化器才能收敛。
检查 API 文档以了解您的标签类型。改变数据类型会影响学习的情况经常发生。

【讨论】：