斯坦福解析器 - 使用德国模型罐答案

【问题标题】：Stanford Parser - use german model jar斯坦福解析器 - 使用德国模型罐
【发布时间】：2016-05-08 21:57:37
【问题描述】：

我想在 coreNLP 中使用 stanford 解析器。我已经让这个例子工作了：

http://stanfordnlp.github.io/CoreNLP/simple.html

但是：我需要德国模式。所以我下载了“stanford-german-2016-01-19-models.jar”。

但是如何设置这个 jar 文件以供使用呢？我只发现：

LexicalizedParser lp = LexicalizedParser.loadModel("englishPCFG.ser.gz");

但我有一个装有生殖模型的罐子，而不是 ...ser.gz。

有人可以帮忙吗？

【问题讨论】：

我会假设 jar 包含数据，并且您会将 jar 添加到项目的构建路径以访问它，不是吗？
你说得对。当然，我已经将德语 .jar 文件添加到我在 Eclipse 中的构建路径中。但是必须有一个选项，我必须设置这个德语文件。如果不是，程序如何知道它应该使用哪种语言。
编辑：当然我也可以用德语句子输入，但是结果标签错误/没有意义。

标签： java parsing stanford-nlp

【解决方案1】：

下面是一些解析德语句子的示例代码：

import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.simple.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.PropertiesUtils;
import edu.stanford.nlp.util.StringUtils;

import java.util.*;

public class SimpleGermanExample {

    public static void main(String[] args) {
        String sampleGermanText = "...";
        Annotation germanAnnotation = new Annotation(sampleGermanText);
        Properties germanProperties = StringUtils.argsToProperties(
                new String[]{"-props", "StanfordCoreNLP-german.properties"});
        StanfordCoreNLP pipeline = new StanfordCoreNLP(germanProperties);
        pipeline.annotate(germanAnnotation);
        for (CoreMap sentence : germanAnnotation.get(CoreAnnotations.SentencesAnnotation.class)) {
            Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
            System.out.println(sentenceTree);
        }
    }
}

确保您下载完整的工具包以使用此示例代码。

http://stanfordnlp.github.io/CoreNLP/

还要确保您的 CLASSPATH 中有德国模型 jar。上面的代码将知道查看 CLASSPATH 中的所有 jar，并将该文件识别为在德语 jar 中。

【讨论】：

非常感谢 - 我稍后会尝试。所以只是不理解：这个 {"-props", "StanfordCoreNLP-german.properties"} 是告诉 corenlp 它使用德国模型的部分？
德国模型jar中有一个文件名为：StanfordCoreNLP-german.properties。

【解决方案2】：

首先：这行得通，谢谢！但是，对于所有这些注释器，我不需要这种复杂的方式。这就是为什么我想从简单的 CoreNLP Api 开始。这是我的代码：

import edu.stanford.nlp.simple.*;
import java.util.*;

public class Main {

public static void main(String[] args) {

    Sentence sent = new Sentence("Lucy is in the sky with diamonds.");
    List<String> posTags =  sent.posTags();
    List<String> words = sent.words();
    for (int i = 0; i < posTags.size(); i++) {
        System.out.println(words.get(i)+" "+posTags.get(i));
    }
  }
}

我怎样才能让德语 prperties 文件与这个例子一起工作？

或者其他方式：在你的例子中我如何只得到带有 pos 标签的单词？

【讨论】：

【解决方案3】：

与英语示例对应的德语如下：

LexicalizedParser lp = LexicalizedParser.loadModel("germanPCFG.ser.gz");

提取最新的 stanford-german-corenlp-2018-10-05-models.jar 文件，您会在文件夹中找到它：stanford-german-corenlp-2018-10 -05-models\edu\stanford\nlp\models\lexparser

【讨论】：