如何使用微调的 BERT 模型进行句子编码？答案

【问题标题】：How to use fine-tuned BERT model for sentence encoding?如何使用微调的 BERT 模型进行句子编码？
【发布时间】：2021-06-16 20:07:15
【问题描述】：

我按照此处的脚本在我自己的数据集上微调了 BERT 基础模型：

https://github.com/cedrickchee/pytorch-pretrained-BERT/tree/master/examples/lm_finetuning

我将模型保存为.pt 文件，现在我想将其用于句子相似性任务。不幸的是，我不清楚如何加载微调后的模型。我尝试了以下方法：

model = BertModel.from_pretrained('trained_model.pt')
model.eval()

这不起作用。它说：

ReadError: not a gzip file

显然，使用from_pretrained 方法加载.pt 文件是不可能的。有人可以帮我从这里出去吗？非常感谢！！ :)

编辑：我将模型保存在 s3 存储桶中，如下所示：

# Convert model to buffer
buffer = io.BytesIO()
torch.save(model, buffer)
# Save in s3 bucket
output_model_file = output_folder + "trained_model.pt"
s3_.put_object(Bucket="power-plant-embeddings", Key=output_model_file, Body=buffer.getvalue())

【问题讨论】：

你是如何保存 .pt 模型的？
啊，谢谢，这是一个有用的信息，请参阅我的编辑！ :)

标签： python nlp pytorch bert-language-model huggingface-transformers

【解决方案1】：

要使用BertModel.from_pretrained() 加载模型，您需要使用save_pretrained() (link) 保存它。

任何其他存储方法都需要相应的负载。我不熟悉 S3，但我假设您可以使用 get_object (link) 检索模型，然后使用 huggingface api 保存它。从此你应该可以正常使用from_pretrained()了。

【讨论】：