用于计算 NLP 问题中的损失的稀疏交叉熵损失。 PyTorch答案

【问题标题】：Sparse Cross Entropy Loss for calculating Loss in NLP problems. PyTorch用于计算 NLP 问题中的损失的稀疏交叉熵损失。 PyTorch
【发布时间】：2020-12-22 11:44:24
【问题描述】：

我的输入张量看起来像：

torch.Size([8, 23])

// where,
// 8 -> batch size
// 23 -> words in each of them

我的输出张量看起来像：

torch.Size([8, 23, 103])

// where,
// 8 -> batch size
// 23 -> words predictions
// 103 -> vocab size.

我想为这个任务计算稀疏交叉熵损失，但我不能，因为 PyTorch 只计算损失单个元素。我如何编码才能工作？感谢您的帮助。

【问题讨论】：

您能解释一下您期望的结果吗？您是否正在寻找torch.nn.BCEWithLogitsLoss？
我正在训练一个编码器-解码器网络，因此输出中的每个位置都有 103 个（词汇大小）位置可供选择。但是由于在 Pytorch 中我只能计算一个单词的损失，我应该如何计算总损失。我正在使用变压器网络。

标签： nlp pytorch huggingface-transformers

【解决方案1】：

根据标题，我假设您来自 tensorflow，SparseCategoricalCrossentropy 确实适用于具有形状的张量，您提供了。

已经很晚了，但万一有人偶然发现了这篇文章；现在这里也有答案了：torch.nn.CrossEntropyLoss over Multiple Batches

至于torch，根据文档：https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

CrossEntropyLoss 假设第二个维度是你的类的大小，所以你只需要像这样转置你的张量的最后两个维度：

loss_obj = torch.nn.CrossEntropyLoss()

logits = model(model_input)  # shape(logits) -> (batch_size, seq_len, n_classes)
loss = loss_obj(torch.transpose(logits, -2, -1), y_true)  # shape(y_true) -> (batch_size, seq_len)

【讨论】：