【发布时间】:2021-12-24 02:02:25
【问题描述】:
i=0
list_of_sent=[]
for sent in df["Heading"]:
filtered_sentence=[]
for w in sent.split():
if len(w)==0:
continue
print(w)
for cleaned_words in clean_punc(w).split():
if(cleaned_words.isalpha()):
filtered_sentence.append(cleaned_words.lower())
else:
continue
list_of_sent.append(filtered_sentence)
我想应用 word2vec 模型,我首先将我的数据列值转换为句子列表,而 cleanpunc 是以下函数:-
import re
def clean_punc(sentence):
cleaned=re.sub(r'[?|!| \'|"|#]',r'',sentence)
cleaned=re.sub(r'[.|,)|(|\|/]',r' ',cleaned)
return cleaned
我正在应用 word2vec 模型:
w2v_model=gensim.models.Word2Vec(list_of_sent,min_count=1,vector_size=50,workers=4)
当我运行以下代码时:-
words=list(w2v_model.wv)
print(len(words))
我收到错误:-
KeyError Traceback (most recent call last)
/tmp/ipykernel_38/1883829707.py in <module>
----> 1 words=list(w2v_model.wv)
2 print(len(words))
/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py in __getitem__(self, key_or_keys)
377 """
378 if isinstance(key_or_keys, KEY_TYPES):
--> 379 return self.get_vector(key_or_keys)
380
381 return vstack([self.get_vector(key) for key in key_or_keys])
/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py in get_vector(self, key, norm)
420
421 """
--> 422 index = self.get_index(key)
423 if norm:
424 self.fill_norms()
/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py in get_index(self, key, default)
394 return default
395 else:
--> 396 raise KeyError(f"Key '{key}' not present")
397
398 def get_vector(self, key, norm=False):
KeyError: "Key '141101' not present"
请帮我解决错误
【问题讨论】:
-
你能不能把函数
clean_punc()也贴出来(假设它是一个函数)。
标签: python nlp word2vec keyerror