【发布时间】:2021-12-17 10:20:07
【问题描述】:
我正在创建一个深度学习代码,将文本嵌入到基于 BERT 的嵌入中。我在之前运行良好的代码中看到了意外问题。下面是sn-p:
sentences = ["person in red riding a motorcycle", "lady cutting cheese with reversed knife"]
# Embed text using BERT model.
text_tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', cache_dir="cache/")
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
print(text_tokenizer.tokenize(sentences[0]))
inputs = text_tokenizer(sentences, return_tensors="pt", padding=True) # error comes here
错误如下:
['person', 'in', 'red', 'riding', 'a', 'motorcycle']
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 92, in <module>
load_data()
File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 59, in load_data
inputs = text_tokenizer(sentences, return_tensors="pt", padding=True)
TypeError: 'DistilBertTokenizer' object is not callable
如您所见,text_tokenizer.tokenize() 工作正常。我尝试强制下载分词器,甚至更改缓存目录,但效果不佳。
代码在其他机器(朋友的笔记本电脑)上运行良好,并且在我尝试安装 torchvision 并使用 PIL 库作为图像部分之前的一段时间内也运行良好。现在它并不总是给出这个错误。
操作系统:MacOS 11.6,使用 Conda 环境,python=3.9
【问题讨论】: