Huggingface Tokenizer 对象不可调用答案

【问题标题】：Huggingface Tokenizer object is not callableHuggingface Tokenizer 对象不可调用
【发布时间】：2021-12-17 10:20:07
【问题描述】：

我正在创建一个深度学习代码，将文本嵌入到基于 BERT 的嵌入中。我在之前运行良好的代码中看到了意外问题。下面是sn-p：

sentences = ["person in red riding a motorcycle", "lady cutting cheese with reversed knife"]
# Embed text using BERT model.
text_tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', cache_dir="cache/")
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
print(text_tokenizer.tokenize(sentences[0]))
inputs = text_tokenizer(sentences, return_tensors="pt", padding=True)  # error comes here

错误如下：

['person', 'in', 'red', 'riding', 'a', 'motorcycle']
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 92, in <module>
    load_data()
  File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 59, in load_data
    inputs = text_tokenizer(sentences, return_tensors="pt", padding=True)
TypeError: 'DistilBertTokenizer' object is not callable

如您所见，text_tokenizer.tokenize() 工作正常。我尝试强制下载分词器，甚至更改缓存目录，但效果不佳。

代码在其他机器（朋友的笔记本电脑）上运行良好，并且在我尝试安装 torchvision 并使用 PIL 库作为图像部分之前的一段时间内也运行良好。现在它并不总是给出这个错误。

操作系统：MacOS 11.6，使用 Conda 环境，python=3.9

【问题讨论】：

标签： huggingface-tokenizers

【解决方案1】：

这是一个相当容易解决的问题。在某个时候，我从environment.yml 文件中删除了转换器版本，并开始使用带有 python=3.9 的 MV 2.x，这可能不允许直接调用标记器。我再次将MV添加为transformers=4.11.2，并在yml文件中添加了频道conda-forge。之后我就可以克服这个错误了。

【讨论】：