使用 stanford NLP 提取名词短语答案

【问题标题】：Extract Noun phrase using stanford NLP使用 stanford NLP 提取名词短语
【发布时间】：2015-04-13 21:48:13
【问题描述】：

我正在尝试使用斯坦福 NLP 从句子中找到主题/名词短语

例如：我想得到的句子“白虎”

主题/名词短语为：白虎。

为此，我使用了 pos 标记器。我的示例代码如下。

我得到的结果是“老虎”，这是不正确的。我曾经运行的示例代码是

public static void main(String[] args) throws IOException {
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize,ssplit,parse");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        Annotation annotation = new Annotation("the white tiger)");
        pipeline.annotate(annotation);
        List<CoreMap> sentences = annotation
                .get(CoreAnnotations.SentencesAnnotation.class);
        System.out.println("the size of the senetence is......"
                + sentences.size());
        for (CoreMap sentence : sentences) {
            System.out.println("the senetence is..." + sentence.toString());
            Tree tree = sentence.get(TreeAnnotation.class);
            PrintWriter out = new PrintWriter(System.out);
            out.println("The first sentence parsed is:");
            tree.pennPrint(out);
            System.out.println("does it comes here.....1111");
            TregexPattern pattern = TregexPattern.compile("@NP");
            TregexMatcher matcher = pattern.matcher(tree);
            while (matcher.find()) {
                Tree match = matcher.getMatch();
                List<Tree> leaves1 = match.getChildrenAsList();
                StringBuilder stringbuilder = new StringBuilder();
                for (Tree tree1 : leaves1) {
                    String val = tree1.label().value();
                    if (val.equals("NN") || val.equals("NNS")
                            || val.equals("NNP") || val.equals("NNPS")) {
                        Tree nn[] = tree1.children();
                        String ss = Sentence.listToString(nn[0].yield());
                        stringbuilder.append(ss).append(" ");

                    }
                }
                System.out.println("the final stringbilder is ...."
                        + stringbuilder);
            }

        }

    }

非常感谢任何帮助。任何其他想法来实现这一目标。

【问题讨论】：

标签： nlp stanford-nlp sentiment-analysis pos-tagger

【解决方案1】：

看起来您正在寻找NN.* 的依赖关系树。 “white”是一个JJ——一个形容词——搜索NN.*时不会包含它。

您应该仔细查看Stanford Dependencies Manual 并确定哪些词性标签包含您要查找的内容。您还应该查看真实的语言数据，以尝试找出您尝试完成的任务中的重要内容。怎么样：

the tiger [with the black one] [who was white]

在这种情况下，只需遍历树就会得到tiger black white。排除 PP 的？然后你会丢失很多有用的信息：

the tiger [with white fur]

我不确定您想要完成什么，但请确保您想要做的事情以正确的方式受到限制。

你也应该完善你的基本语法。 “白虎”是语言学家所说的名词短语或NP。语言学家很难将NP 称为一个句子。一个句子里面也经常有很多NPs；有时，它们甚至相互嵌入。斯坦福依赖手册是一个好的开始。顾名思义，Stanford Dependencies 是基于 dependency grammar 的想法，尽管有 other approaches 带来了不同的见解。

了解语言学家对句子结构的了解可以显着帮助您理解要提取的内容，或者（经常发生）意识到您要提取的内容太难而您需要寻找解决方案的新途径。

【讨论】：