句子嵌入 BERT答案

【问题标题】：Sentence embeddings BERT句子嵌入 BERT
【发布时间】：2020-05-14 15:20:19
【问题描述】：

我需要一个信息。我用这个：https://towardsdatascience.com/improving-sentence-embeddings-with-bert-and-representation-learning-dfba6b444f6b 来提取特征，但我得到了词嵌入。

如果我想在我的数据上使用 BERT 训练进行句子嵌入，我该怎么做？

示例：句子“我想跑步”--> 结果 [1,768] 数组嵌入谢谢。

【问题讨论】：

标签： embedding sentence bert-language-model

【解决方案1】：

我推荐几种方法。如果您使用 HuggingFace，请尝试以下suggestion：

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0)  # 
Batch size 1
outputs = model(input_ids)
last_hidden_states = outputs[0]  # The last hidden-state is the first element of the 
output tuple

我邀请您使用Sentence_Transformers。该项目使用连体或三元网络结构对 BERT / RoBERTa / DistilBERT / ALBERT / XLNet 进行微调，以产生语义上有意义的句子嵌入。您可以使用 Flair 来测试Sentence Transformer。

您也可以尝试 Flair TransformerDocumentEmbeddings。见examples。

【讨论】：