改变神经网络中 Sigmoid 激活的阈值答案

【问题标题】：Changing thresholds in the Sigmoid Activation in Neural Networks改变神经网络中 Sigmoid 激活的阈值
【发布时间】：2020-04-23 06:55:51
【问题描述】：

您好，我是机器学习的新手，我有一个关于更改 sigmoid 函数阈值的问题。

我知道 Sigmoid 函数的值在 [0;1] 范围内，以 0.5 作为阈值，如果 h(theta) = 0.5 那么它是1.

阈值仅用于网络的输出层，并且仅在分类时使用。那么，如果您尝试在 3 个类别之间进行分类，您能否为每个类别提供不同的阈值（0.2、0.4、0.4 - 每个类别）？或者您可以指定一个不同的总体阈值，例如 0.8？我不确定如何在下面的代码中定义它。任何指导表示赞赏。

# Hyper Parameters
input_size = 14
hidden_size = 40
hidden_size2 = 30
num_classes = 3
num_epochs = 600
batch_size = 34
learning_rate = 0.01


class Net(torch.nn.Module):
    def __init__(self, n_input, n_hidden, n_hidden2, n_output):
        super(Net, self).__init__()
        # define linear hidden layer output
        self.hidden = torch.nn.Linear(n_input, n_hidden)
        self.hidden2 = torch.nn.Linear(n_hidden, n_hidden2)
        # define linear output layer output
        self.out = torch.nn.Linear(n_hidden, n_output)

    def forward(self, x):
        """
            In the forward function we define the process of performing
            forward pass, that is to accept a Variable of input
            data, x, and return a Variable of output data, y_pred.
        """
        # get hidden layer input
        h_input1 = self.hidden(x)
        # define activation function for hidden layer
        h_output1 = torch.sigmoid(h_input1)

        # get hidden layer input
        h_input2 = self.hidden2(h_output1)
        # define activation function for hidden layer
        h_output2 = torch.sigmoid(h_input2)

        # get output layer output
        out = self.out(h_output2)

        return out


net = Net(input_size, hidden_size, hidden_size, num_classes)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

all_losses = []

for epoch in range(num_epochs):
    total = 0
    correct = 0
    total_loss = 0
    for step, (batch_x, batch_y) in enumerate(train_loader):
        X = batch_x
        Y = batch_y.long()

    # Forward + Backward + Optimize
    optimizer.zero_grad()  # zero the gradient buffer
    outputs = net(X)
    loss = criterion(outputs, Y)
    all_losses.append(loss.item())
    loss.backward()
    optimizer.step()

    if epoch % 50 == 0:
        _, predicted = torch.max(outputs, 1)
        # calculate and print accuracy
        total = total + predicted.size(0)
        correct = correct + sum(predicted.data.numpy() == Y.data.numpy())
        total_loss = total_loss + loss
    if epoch % 50 == 0:
        print(
            "Epoch [%d/%d], Loss: %.4f, Accuracy: %.2f %%"
            % (epoch + 1, num_epochs, total_loss, 100 * correct / total)
        )

train_input = train_data.iloc[:, :input_size]
train_target = train_data.iloc[:, input_size]

inputs = torch.Tensor(train_input.values).float()
targets = torch.Tensor(train_target.values - 1).long()

outputs = net(inputs)
_, predicted = torch.max(outputs, 1)

【问题讨论】：

标签： python machine-learning deep-learning neural-network pytorch

【解决方案1】：

您可以使用任何您认为合适的阈值。

众所周知，神经网络通常过于自信（例如，将0.95 应用于50 类之一），因此在您的情况下使用不同的阈值可能是有益的。

您的训练很好，但您应该更改预测（最后两行）并像这样使用torch.nn.softmax：

outputs = net(inputs) 
probabilities = torch.nn.functional.softmax(outputs, 1)

如其他答案中所述，您将获得概率总和为1 的每一行（以前您有未归一化的概率，即 logits）。

现在，只需对这些概率使用您想要的阈值：

predictions = probabilities > 0.8

请注意，在某些情况下您可能只会得到零（例如[0.2, 0.3, 0.5]）。

这意味着根据您的标准，神经网络不够自信，并且可能会减少不正确的阳性预测的数量（抽象，但假设您正在预测患者是否没有相互排斥的3 疾病之一。最好只有在你真的确定的时候才这么说）。

每个类别的阈值不同

也可以这样完成：

thresholds = torch.tensor([0.1, 0.1, 0.8]).unsqueeze(0)
predictions = probabilities > thresholds

最后的比赛

请注意，在softmax 的情况下，只有一个类应该是答案（正如另一个答案中指出的那样），这种方法（以及提到 sigmoid）可能表明您在 多标签分类之后。

如果您想训练您的网络以便它可以同时预测类别，您应该使用sigmoid 并将您的损失更改为torch.nn.BCEWithLogitsLoss。

【讨论】：

我从未见过不提供任何预测的网络。你能指出一些解释这种方法的来源吗？
@DiegoMarin 确实如此，仅包含负面回应。有点类似于多标签，但针对互斥类进行了训练。我不确定 OP 的确切案例或目标（也许只是探索？）所以我提供了可能的背景用例和 cmets 的技术答案。虽然我不知道任何有关它的来源（例如研究论文），但我也没有尽我的职责进行正确的研究。如果你愿意并且想知道，请在这里分享，谢谢
当您谈论网络不提供预测时，您是在谈论预测阶段吗？网络训练好之后？在这种情况下，它可能有意义，但我仍然不确定。得到一个低信心的答案可能比根本没有答案要好。
@DiegoMarin 是的，在网络被训练之后，没有谈论训练（除了答案末尾的一小部分），如果措辞没有更清楚地给出，对不起。我可以看到不同总体阈值的案例（尽管示例非常抽象）。对于不同的阈值，它会变得陌生，但我想每种疾病可能都有自己的置信度阈值。或者，也许 OP 是在多标签分类之后。同意这是来自 OP 的非常罕见的请求，但是哦，好吧，我猜发生了

【解决方案2】：

在多类分类中，每个类都应该有一个输出。然后你可以使用softmax function对输出进行归一化处理，所以它们的总和为1。最大的输出就是被选为分类的输出。

【讨论】：

虽然我同意这是传统方法并且您提出的内容是正确的，但这并不是 OP 实际询问的内容