张量流分类尝试中的损失没有减少答案

【问题标题】：loss not reducing in tensorflow classification attempt张量流分类尝试中的损失没有减少
【发布时间】：2020-10-02 14:26:51
【问题描述】：

我想根据单一输入的训练数据（即学生的考试成绩）模拟对学生是否通过课程进行分类。

我首先创建了 1000 名学生的考试成绩数据集，该数据集正态分布，均值为 80。然后我为前 300 名学生创建了一个分类“1”（及格），它基于种子是 80.87808591534409 的测试分数。

（显然我们并不真的需要机器学习，因为这意味着任何测试分数高于 80.87808591534409 的人都能通过课程。但我想建立一个准确预测这一点的模型，这样我就可以开始添加新的输入特征并将我的分类扩展到通过/失败）。

接下来，我以相同的方式创建了一个测试集，并使用之前为训练集计算的分类阈值 (80.87808591534409) 对这些学生进行了分类。

然后，正如您在下面或链接的 Jupyter 笔记本中看到的那样，我创建了一个模型，该模型采用一个输入特征并返回两个结果（零索引分类的概率（失败）和一个索引分类的概率（通过)。

然后我在训练数据集上对其进行了训练。但是正如你所看到的，每次迭代的损失永远不会真正改善。它只是徘徊在 0.6。

最后，我在测试数据集上运行经过训练的模型并生成预测。

我将结果绘制如下：

绿线代表测试集的实际（而非预测）分类。蓝线代表 0 指数结果（失败）的概率，橙色线代表 1 指数结果（通过）的概率。

如您所见，它们保持平坦。如果我的模型正常工作，我会期望这些线在实际数据从失败切换到通过的阈值处交换位置。

我想我可能做错了很多事情，但如果有人有时间查看下面的代码并给我一些建议，我将不胜感激。

我已经为我的尝试创建了一个公开的工作示例here。我在下面包含了当前代码。

我遇到的问题是模型训练似乎卡在计算损失中，因此它报告我的测试集中的每个学生（所有 1,000 名学生都失败），无论他们的测试结果如何，这显然是错误的。

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.experimental.list_physical_devices("GPU") else "NOT AVAILABLE")

## Create data
# Set Seed
np.random.seed(0)
# Create 1000 test scores normally distributed with a range of 2 with a mean of 80
train_exam_scores = np.sort(np.random.normal(80,2,1000))
# Create classification; top 300 pass the class (classification of 1), bottom 700 do not class (classification of 0)
train_labels = np.array([0. for i in range(700)])
train_labels = np.append(train_labels, [1. for i in range(300)])

print("Point at which test scores correlate with passing class: {}".format(train_exam_scores[701]))
print("computed point with seed of 0 should be: 80.87808591534409")
print("Plot point at which test scores correlate with passing class")
## Plot view
plt.plot(train_exam_scores)
plt.plot(train_labels)
plt.show()

#create another set of 1000 test scores with different seed (10)
np.random.seed(10)
test_exam_scores = np.sort(np.random.normal(80,2,1000))
# create classification labels for the new test set based on passing rate of 80.87808591534409 determined above
test_labels = np.array([])
for index, i in enumerate(test_exam_scores):
    if (i >= 80.87808591534409):
        test_labels = np.append(test_labels, 1)
    else:
        test_labels = np.append(test_labels, 0)
plt.plot(test_exam_scores)
plt.plot(test_labels)
plt.show()

print(tf.shape(train_exam_scores))
print(tf.shape(train_labels))
print(tf.shape(test_exam_scores))
print(tf.shape(test_labels))
train_dataset = tf.data.Dataset.from_tensor_slices((train_exam_scores, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices((test_exam_scores, test_labels))
BATCH_SIZE = 5
SHUFFLE_BUFFER_SIZE = 1000

train_dataset = train_dataset.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
test_dataset = test_dataset.batch(BATCH_SIZE)

# view example of feature to label correlation, values above 80.87808591534409 are classified as 1, those below are classified as 0
features, labels = next(iter(train_dataset))
print(features)
print(labels)

# create model with first layer to take 1 input feature per student; and output layer of two values (percentage of 0 or 1 classification)
model = tf.keras.Sequential([
  tf.keras.layers.Dense(10, activation=tf.nn.relu, input_shape=(1,)),  # input shape required
  tf.keras.layers.Dense(10, activation=tf.nn.relu),
  tf.keras.layers.Dense(2)
])

# Test untrained model on training features; should produce nonsense results
predictions = model(features)
print(tf.nn.softmax(predictions[:5]))
print("Prediction: {}".format(tf.argmax(predictions, axis=1)))

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)

model.compile(optimizer=optimizer,
              loss=loss_object,
              metrics=['categorical_accuracy'])


#train model 

model.fit(train_dataset,
                epochs=20,
                validation_data=test_dataset,
                verbose=1)

#make predictions on test scores from test_dataset
predictions = model.predict(test_dataset)

tf.nn.softmax(predictions[:1000])

tf.argmax(predictions, axis=1)

# I anticipate that the predictions would show a higher probability for index position [0] (classification 0, "did not pass") 
#until it reaches a value greater than 80.87808591534409 
# which in the test data with a seed of 10 should be the value at the 683 index position
# but at this point I would expect there to be a higher probability for index position [1] (classification 1), "did pass" 
# because it is obvious from the data that anyone who scores higher than 80.87808591534409 should pass.
# Thus in the chart below I would expect the lines charting the probability to switch precisely at the point where the test classifications shift.
# However this is not the case. All predictions are the same for all 1000 values.
plt.plot(tf.nn.softmax(predictions[:1000]))
plt.plot(test_labels)
plt.show()

【问题讨论】：

标签： python tensorflow machine-learning keras

【解决方案1】：

这里的主要问题：在最后一层使用softmax 激活，而不是单独在模型之外。将最后一层更改为：

tf.keras.layers.Dense(2, activation="softmax")

其次，对于两个带有 relu 的隐藏层，0.1 可能是一个太高的学习率。尝试使用 0.01 或 0.001 的较低速率。

要尝试的另一件事是将输入除以 100，得到范围 [0, 1] 内的输入。这使得训练更容易，因为更新步骤不会大量修改权重。

【讨论】：

非常感谢。我尝试了这些更改。我还认为，当我要求 2 个输出时，我的训练数据标签是一维数组，例如 [0, 1, 0, 1, 0, 0] 等，这可能是个问题。所以我将训练标签数组更改为 2-d [[0, 1], [0, 1], [1,0], [1,0]]。 [0,1] 表示“失败”为假，“通过”为真。 [1,0] 表示“失败”为真，“通过”为假。但不幸的是，它似乎没有帮助。我猜还有很多东西要学。如果有兴趣，可以在这里查看我的更改：colab.research.google.com/drive/…。