如何使用 FastText 查找与两个关键字相似的单词？答案

【问题标题】：How to find word that are similar to two keywords using FastText?如何使用 FastText 查找与两个关键字相似的单词？
【发布时间】：2021-02-08 07:58:41
【问题描述】：

我正在尝试查找与两个不同单词相似的单词。我知道我可以使用 FastText 找到最相似的词，但我想知道是否有办法找到与两个关键词相似的关键词。例如，“apple”类似于“orange”，也类似于“kiwi”。所以，我想做的是，如果我有两个词，“器官”和“猕猴桃”，那么我想得到关键字“苹果”或任何其他水果的建议。有没有办法做到这一点？

【问题讨论】：

这能回答你的问题吗？ How to find similar words with FastText?
如果有用，请接受（和/或投票）我的回答。

标签： nlp gensim word2vec fasttext conceptnet

【解决方案1】：

我认为这个功能没有开箱即用的功能。

无论如何，你可以考虑一下这个简单的方法：

加载预训练嵌入（here 可用）
为每个感兴趣的单词获取相当数量的最近邻
在两个词的最近邻中搜索交叉点

小提示：这是一种粗略的方法。如有必要，可以使用相似余弦执行更复杂的操作。

代码示例：

import fasttext

# load the pretrained model
# (in the example I use the Italian model)
model=fasttext.load_model('./ml_models/cc.it.300.bin')

# get nearest neighbors for the interested words (100 neighbors)
arancia_nn=model.get_nearest_neighbors('arancia', k=100)
kiwi_nn=model.get_nearest_neighbors('kiwi', k=100)

# get only words sets (discard the similarity cosine)
arancia_nn_words=set([el[1] for el in arancia_nn])
kiwi_nn_words=set([el[1] for el in kiwi_nn])

# compute the intersection
common_similar_words=arancia_nn_words.intersection(kiwi_nn_words)

示例输出（意大利语）：

{'agrume',
 'agrumi',
 'ananas',
 'arance',
 'arancie',
 'arancio',
 'avocado',
 'banana',
 'ciliegia',
 'fragola',
 'frutta',
 'lime',
 'limone',
 'limoni',
 'mandarino',
 'mela',
 'mele',
 'melograno',
 'melone',
 'papaia',
 'papaya',
 'pera',
 'pompelmi',
 'pompelmo',
 'renetta',
 'succo'}

【讨论】：

【解决方案2】：

多年来，我一直使用 Gensim W2V 实现来进行此类计算，但 Gensim 也有 FastText 实现：https://radimrehurek.com/gensim/models/fasttext.html

【讨论】：