【发布时间】:2016-08-03 07:07:09
【问题描述】:
from deepdist import DeepDist
from gensim.models.word2vec import Word2Vec
from pyspark import SparkConf, SparkContext
conf = (SparkConf()
.setAppName("Work2Vec")
)
sc = SparkContext(conf=conf)
corpus = sc.textFile('AllText.txt').map(lambda s: s.split())
def gradient(model, sentences):
syn0, syn1 = model.syn0.copy(), model.syn1.copy() # previous weights
model.train(sentences)
return {'syn0': model.syn0 - syn01, 'syn1': model.syn1 - syn1}
def descent(model, update):
model.syn0 += update['syn0']
model.syn1 += update['syn1']
with DeepDist(Word2Vec(corpus.collect())) as dd:
dd.train(corpus, gradient, descent)
dd.model.save("Model")
请帮帮我,我有一个 56Gb 的文本,想建立一个 word2Vec 模型,但只使用 gensim 很慢,所以我在网上尝试 deepdist 和他们的示例代码,所以我想知道有没有人看到这种错误
我运行这个脚本时的输出:
【问题讨论】:
-
这是网络链接deepdist.com
标签: python pyspark gensim word2vec