忽略词形还原器的单词答案

【问题标题】：Ignore words for lemmatizer忽略词形还原器的单词
【发布时间】：2015-01-15 06:25:58
【问题描述】：

我想使用 Stanford CoreNLP 进行词形还原，但我有一些词不能进行词形还原。有没有办法将此忽略列表提供给该工具？我正在关注这个code，当程序调用this.pipeline.annotate(document);then 时，就是这样；很难替换这些事件。一种解决方案是创建一个映射列表，其中每个要忽略的单词都与 lemmatize(word) 配对（即 d = {(w1, lemmatize(w1)), (w2, lemmatize(w2), ...} 和用这个映射列表做后期处理。但我想应该比这更容易。

感谢您的帮助。

【问题讨论】：

标签： stanford-nlp

【解决方案1】：

我想我在朋友的帮助下找到了解决方案。

  for(CoreMap sentence: sentences) {
        // Iterate over all tokens in a sentence
        for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
            System.out.print(token.get(OriginalTextAnnotation.class) + "\t");
            System.out.println(token.get(LemmaAnnotation.class));

        }
    }

您可以拨打token.get(OriginalTextAnnotation.class)获取单词的原始形式。

【讨论】：