【问题标题】:Huggingface Tokenizer object is not callableHuggingface Tokenizer 对象不可调用
【发布时间】:2021-12-17 10:20:07
【问题描述】:

我正在创建一个深度学习代码,将文本嵌入到基于 BERT 的嵌入中。我在之前运行良好的代码中看到了意外问题。下面是sn-p:

sentences = ["person in red riding a motorcycle", "lady cutting cheese with reversed knife"]
# Embed text using BERT model.
text_tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', cache_dir="cache/")
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
print(text_tokenizer.tokenize(sentences[0]))
inputs = text_tokenizer(sentences, return_tensors="pt", padding=True)  # error comes here

错误如下:

['person', 'in', 'red', 'riding', 'a', 'motorcycle']
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 92, in <module>
    load_data()
  File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 59, in load_data
    inputs = text_tokenizer(sentences, return_tensors="pt", padding=True)
TypeError: 'DistilBertTokenizer' object is not callable

如您所见,text_tokenizer.tokenize() 工作正常。我尝试强制下载分词器,甚至更改缓存目录,但效果不佳。

代码在其他机器(朋友的笔记本电脑)上运行良好,并且在我尝试安装 torchvision 并使用 PIL 库作为图像部分之前的一段时间内也运行良好。现在它并不总是给出这个错误。

操作系统:MacOS 11.6,使用 Conda 环境,python=3.9

【问题讨论】:

    标签: huggingface-tokenizers


    【解决方案1】:

    这是一个相当容易解决的问题。在某个时候,我从environment.yml 文件中删除了转换器版本,并开始使用带有 python=3.9 的 MV 2.x,这可能不允许直接调用标记器。我再次将MV添加为transformers=4.11.2,并在yml文件中添加了频道conda-forge。之后我就可以克服这个错误了。

    【讨论】:

      猜你喜欢
      • 2021-02-16
      • 2020-08-16
      • 2021-08-21
      • 2021-10-19
      • 2018-09-30
      • 2021-02-18
      • 1970-01-01
      • 2022-08-11
      • 2016-10-01
      相关资源
      最近更新 更多