【问题标题】:Gensim Compute centroid from list of wordsGensim 从单词列表中计算质心
【发布时间】:2021-05-12 23:54:36
【问题描述】:

如何从词嵌入中计算给定 5 个词的质心,然后从该质心中找到最相似的词。 (在gensim中)

【问题讨论】:

    标签: nlp gensim word2vec


    【解决方案1】:

    您应该查看Word2Vec gensim tutorial

    from gensim.test.utils import datapath
    from gensim import utils
    
    
    class MyCorpus:
        """An iterator that yields sentences (lists of str)."""
    
        def __iter__(self):
            corpus_path = datapath('lee_background.cor')
            for line in open(corpus_path):
                # assume there's one document per line, tokens separated by whitespace
                yield utils.simple_preprocess(line)
    
    
    import gensim.models
    
    sentences = MyCorpus()
    model = gensim.models.Word2Vec(sentences=sentences)
    word_vectors = model.wv
    
    
    import numpy as np
    
    centroid = np.average([word_vectors[w] for w in ['king', 'man', 'walk', 'tennis', 'victorian']], axis=0)
    
    word_vectors.similar_by_vector(centroid)
    

    在这种情况下会给你

    [('man', 0.9996674060821533),
     ('by', 0.9995684623718262),
     ('over', 0.9995648264884949),
     ('from', 0.9995632171630859),
     ('were', 0.9995599389076233),
     ('who', 0.99954754114151),
     ('today', 0.9995439648628235),
     ('which', 0.999538004398346),
     ('on', 0.9995279312133789),
     ('being', 0.9995211958885193)]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2015-06-04
      • 1970-01-01
      • 2013-09-06
      • 2018-09-04
      • 2020-05-25
      相关资源
      最近更新 更多