如何在原生 FastText 中加载 Gensim FastText 模型答案

【问题标题】：How to load Gensim FastText model in native FastText如何在原生 FastText 中加载 Gensim FastText 模型
【发布时间】：2018-10-14 03:29:37
【问题描述】：

我在 Gensim 中训练了一个 FastText 模型。我想用它来编码我的句子。具体来说，我想使用原生 FastText 的这个功能：

./fasttext print-word-vectors model.bin < queries.txt

如何在 Gensim 中保存模型，使其成为原生 FastText 可以理解的正确二进制格式？

我在 Python 3.4.3 下使用 FastText 0.1.0 和 Gensim 3.4.0。

本质上，我需要 Gensim FastText doc 中给出的 load_binary_data() 的倒数。

【问题讨论】：

标签： gensim fasttext

【解决方案1】：

你可能不会在 gensim 中找到这样的功能，因为这意味着依赖于内部结构和代码，就像你在 fasttext-python 中看到的那样（它使用 pybind 直接调用内部 fasttext api）。对外部库有如此巨大的依赖是 gensim 的创建者想要避免的，这就是他们可能deprecated the functionality to call the fasttext wrapper 的原因。现在 gensim 只寻求通过自己的内部实现来提供 fasttext 算法。我建议你使用python bindings for fasttext。

$ git clone https://github.com/facebookresearch/fastText.git
$ cd fastText
$ pip install .

现在使用 fasttext 模型在您的 python 应用程序中运行训练集。

from fastText import train_unsupervised
model = train_unsupervised(input="pathtotextfile", model='skipgram')
model.save_model('model.bin')

这将以 fastText 命令行格式保存模型。所以你现在应该可以运行以下命令了。

$ ./fasttext print-word-vectors model.bin < queries.txt

【讨论】：

感谢您的回复，但我希望不必重新训练我已经在 gensim 中训练过的模型。