从 TFRobertaForSequenceClassification 获取错误的 logits 形状答案

【问题标题】：Getting wrong shape of logits from TFRobertaForSequenceClassification从 TFRobertaForSequenceClassification 获取错误的 logits 形状
【发布时间】：2021-03-25 01:43:35
【问题描述】：

我正在使用 huggingface 库的 TFRobertaForSequenceClassification 类来创建分类器。根据文档，logits 输出的形状应为 (batch_size, num_labels)。但是我得到了 (batch_size, seq_length, num_labels) 我不明白为什么。

重现这个：

from transformers import TFRobertaForSequenceClassification, RobertaConfig
import numpy as np

seq_len = 512

classifier = TFRobertaForSequenceClassification(RobertaConfig())

#create random inputs for demo
input_ids = np.random.randint(0,10000, size=(seq_len,))
attention_mask = np.random.randint(0,2, size=(seq_len,))
token_type_ids = np.random.randint(0,2, size=(seq_len,))

#make a prediction with batch_size of 1
output = classifier.predict([input_ids, attention_mask, token_type_ids])

print(output.logits.shape)

这会以 (512,2) 的形式输出 logits，但我期待 (1,2) 或 (batch_size, num_labels)。谁能解释它为什么会这样？

【问题讨论】：

标签： python-3.x tensorflow keras pytorch huggingface-transformers

【解决方案1】：

在 github 上为此创建了一个问题并得到了答案。必须对预测进行批处理（不足以将批量大小为 1 的列表添加到包含预测的列表中）。此外，510 也是最大长度，因为我们必须考虑开始和结束标记 - 在此处进一步讨论：

https://github.com/huggingface/transformers/issues/9102

【讨论】：