DL4J：如何使用 GoogleNews 向量计算两个新句子之间的语义相似度答案

【问题标题】：DL4J: How to calculate semantic similarity between two new sentences using GoogleNews vectorDL4J：如何使用 GoogleNews 向量计算两个新句子之间的语义相似度
【发布时间】：2018-01-31 19:05:40
【问题描述】：

我正在浏览 DL4J 的基本教程。我打算比较两个看不见的句子之间的相似性。一旦使用GoogleNews-vectors-negative300.bin.gz 完成 W2V，我使用了一个简单的示例来比较 2 个单词。当我尝试使用 GoogleNews-vectors-negative300.bin.gz 进行句子比较时，如下所示：

File gModel = new File("GoogleNews-vectors-negative300.bin.gz");
Word2Vec vecGoogle = WordVectorSerializer.readWord2VecModel(gModel);

ParagraphVectors vecGoogleForSentences = new ParagraphVectors.Builder()
.useExistingWordVectors(vecGoogle)
.build();

System.out.println(Transforms.cosineSim(vecGoogleForSentences.inferVector("I like bananas and mangoes"), vecGoogleForSentences.inferVector("I like mangoes")));

我得到错误：

org.nd4j.linalg.exception.ND4JIllegalStateException：模型存在作为现有传递，没有可用的 syn1/syn1Neg

有人可以解释一下我在这里遇到的问题，或者我如何使用向量 GoogleNews-vectors-negative300.bin.gz 在语义上比较 2 个看不见的句子吗？我正在尝试的是基于DL4J demo code中给出的建议：

非常感谢提前！

【问题讨论】：

Fwiw，DL4J 在这里激活：gitter.im/deeplearning4j/deeplearning4j

标签： nlp deeplearning4j dl4j

【解决方案1】：

一种更简单的方法是计算句子中所有单词的平均向量（向量总和除以 n）并计算它们的余弦相似度。

【讨论】：

嗨，亚当，你的意思是，应该计算 .getWordVectorMatrix(string) 以获得每个单词的向量矩阵？取平均值后，我们如何计算向量之间的余弦相似度？是不是已经有可用的 API 了？