【发布时间】:2017-03-28 14:18:55
【问题描述】:
我正在使用 csv 文件中的句子在 gensim 中训练 word2vec 模型,如下所示:
import string
import gensim
import csv
import nltk
path = '/home/neel/Desktop/csci544_proj/test/sample.csv'
translator = str.maketrans({key: None for key in string.punctuation})
class gen(object):
def __init__(self, path):
self.path = path
def __iter__(self):
with open(path) as infile:
reader = csv.reader(infile)
for row in reader:
rev = row[4]
l = nltk.sent_tokenize(rev)
for sent in l:
sent = sent.translate(translator)
yield sent.lower().split()
sentences = [path]
for p in gen(path):
model = gensim.models.Word2Vec(p, min_count=1, iter=1)
print(model.vocab.keys())
我得到以下结果: (['b', 'u', 'm', 'h', 'e', 'n', 'r', 'v', 'i', 'a', 't', 's', 'k', 'w', 'o', 'l'])
我得到的结果不是文字而是文字。程序哪里出错了?
【问题讨论】:
标签: python gensim word2vec yield-keyword