ValueError：logits 和标签必须具有相同的形状 ((None, 10) vs (None, 1))答案

【问题标题】：ValueError: logits and labels must have the same shape ((None, 10) vs (None, 1))ValueError：logits 和标签必须具有相同的形状 ((None, 10) vs (None, 1))
【发布时间】：2021-08-18 17:44:14
【问题描述】：

我是 tensorflow 的新手，我正在尝试构建一个简单的模型来输出安装概率（安装列）。

这里是数据集的一个子集：

{'A': {0: 12, 2: 28, 3: 26, 4: 9, 5: 36},
 'B': {0: 10, 2: 17, 3: 22, 4: 2, 5: 31},
 'C': {0: 1, 2: 0, 3: 5, 4: 0, 5: 1},
 'D': {0: 5, 2: 0, 3: 0, 4: 0, 5: 0},
 'E': {0: 12, 2: 1, 3: 4, 4: 3, 5: 1},
 'F': {0: 12, 2: 2, 3: 14, 4: 9, 5: 11},
 'install': {0: 0, 2: 0, 3: 1, 4: 0, 5: 0},
 'G': {0: 21, 2: 12, 3: 8, 4: 13, 5: 19},
 'H': {0: 0, 2: 5, 3: 1, 4: 6, 5: 5},
 'I': {0: 21, 2: 22, 3: 5, 4: 10, 5: 20},
 'J': {0: 0.0, 2: 136.5, 3: 0.0, 4: 0.1, 5: 29.5},
 'K': {0: 0.15220949263502456,
  2: 0.08139534883720931,
  3: 0.15625,
  4: 0.15384584755440725,
  5: 0.04188829787234043},
 'L': {0: 649, 2: 379, 3: 531, 4: 660, 5: 242},
 'M': {0: 0, 2: 0, 3: 0, 4: 1, 5: 1},
 'N': {0: 1, 2: 1, 3: 1, 4: 0, 5: 0},
 'O': {0: 0, 2: 1, 3: 0, 4: 1, 5: 0},
 'P': {0: 0, 2: 0, 3: 0, 4: 0, 5: 0},
 'Q': {0: 1, 2: 0, 3: 1, 4: 0, 5: 1}}

这里是我正在处理的代码：

X = df.drop('install', axis=1) #data
y = df['install'] #target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42, test_size = 0.3)

X_train = ss.fit_transform(X_train)
X_test = ss.fit_transform(X_test)

model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(128, activation='softmax'),
  keras.layers.Dropout(0.2),
  keras.layers.Dense(10)
])

loss = keras.losses.BinaryCrossentropy(from_logits=True)
optim = keras.optimizers.Adam(lr=0.001)
metrics = ["accuracy"]

model.compile(loss=loss, optimizer=optim, metrics=metrics)

batch_size = 32
epoch = 5
model.fit(X_train, y_train, batch_size=batch_size, epochs=epoch, shuffle=True, verbose=1)

您能帮我理解错误吗？我知道问题在于我的 X 和 y 的大小。

【问题讨论】：

标签： python tensorflow keras deep-learning

【解决方案1】：

注意：您尚未指定 ss 对象属于哪个类，因此我将讨论删除它的所有内容。

首先让我们讨论一下您的目标。即安装列。根据这些值，我假设您的问题是二元分类，即预测 0 和 1 并且您想要拥有它们的概率。

为此，您必须如下定义您的模型。

model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dropout(0.2),
  keras.layers.Dense(2, activation='softmax')
])

'''
Note: I have changed the activation of the first `dense` layer from
'softmax` to `relu` as `softmax` is not ideal for inner layers as it greatly
reduce information from each node. Although having 'softmax' will not result
in any syntax error but it is methodologically wrong.

Now the next major change is changing the number of units in the last
`Dense` layer from 10 to 2. What you want is the probability of having
either 0 or 1. So if you have the have the output from your model as `[a ,
b]` here a is some value corresponding to 0 and b corresponding to 1 then
you can get probability on them using the 'softmax' activation. Without
activation the values we get are called 'logits'.
'''

# Now you have to change your loss function as below
loss = tf.keras.losses.SparseCategoricalCrossentropy()

# The rest is same. Now we run a dummy trial of the model after training it using your code.

preds = model.predict(X_test)
preds
'''
This gives the results:
array([[9.9999726e-01, 2.7777487e-06],
       [9.5156413e-01, 4.8435837e-02]], dtype=float32)

This says the probability of sample 1 being 0 is '9.9999726e-01' i.e.
'0.999..' and of it being 1 is '2.7777487e-06' i.e. '0.00000277..` and these
gracefully sum up to 1. Same for the sample 2.
'''

还有另一种方法可以做到这一点。由于您只有 1 个标签，因此如果您有与该标签相对应的概率，那么您可以通过从 1 中减去它来获得与另一个标签相对应的概率。您可以按如下方式实现它：

model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dropout(0.2),
  keras.layers.Dense(1, activation='sigmoid')
])

'''
The difference is 'softmax' and 'sigmoid' is that the 'softmax' is applied
on all the units in a unified manner but 'sigmoid' is applied on each
individual unit. So you can say that 'softmax' is the applied on the 'layer'
and 'sigmoid' is applied on the 'units'.

Now the output of the 'sigmoid' is the probability of the result being 1. So
we can say that the result could either be 0 or 1 depending on the output
probability with some threshold and hence we will not use a different loss
that is BinaryCrossEntropy as the values will be binary (either 0 or 1).
'''

loss = keras.losses.BinaryCrossentropy() # again without logits

# We once again the train the model using the rest of the code and analyze
the outputs.

preds = model.predict(X_test)
preds
'''
This gives the results:
array([[1.6424768e-13],
       [2.0349980e-06]], dtype=float32)

So for sample 1 we have the probability of it being '1' as '1.6424768e-13'
and as we have only '1' and '0' the probability of it being '0' is '1 -
1.6424768e-13'. Same for the sample 2.
'''

现在来自@Mattpats 的回答。这个答案也可以，但在这种情况下，你不会得到概率作为输出，而是你会得到logits，因为你没有使用任何activation，损失是通过指定参数logits计算的987654330@。对于这个概率，你必须像下面这样使用它：

preds = model.predict(X_test)
sigmoid_preds = tf.math.sigmoid(preds).numpy()
preds, sigmoid_preds
'''
This give the following results:
preds = array([[-51.056973],
              [-32.444508]], dtype=float32)

sigmoid_preds = array([[6.702527e-23],
                      [8.119502e-15]], dtype=float32)
'''

【讨论】：

【解决方案2】：

正如现在所写，您创建了形状为(3,) 的测试标签y_train，每个火车标签只是0 或1。该网络设置为从 10 个类别中获取训练标签。这就是该行在模型创建阶段所做的：

keras.layers.Dense(10)

要改成二分类，建议把这最后一层改成

keras.layers.Dense(1, activation='sigmoid')

您还需要将损失修改为：

loss = keras.losses.BinaryCrossentropy()

如果您想创建一个包含 10 个类的多类分类，那么您需要将 y_train 修改为一个包含 10 列的数组。

【讨论】：

【解决方案3】：

我相信您网络中的最后一层会输出 10 个值，而它应该是 1。

model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(128, activation='softmax'),
  keras.layers.Dropout(0.2),
  keras.layers.Dense(1) # needs to be 1
])

【讨论】：

在我的系统上运行它，这几乎是正确的——我也需要更改损失函数。