斯坦福 coreNLP 情绪没有拆分句子答案

【问题标题】：Stanford coreNLP sentiment without splitting sentences斯坦福 coreNLP 情绪没有拆分句子
【发布时间】：2016-09-17 00:02:11
【问题描述】：

我有一些文件要提供给 coreNLP 的情绪标记器。我已经将文件分解成单独的句子，因此希望每个文件返回一个标签。如何让java命令返回一个标签。

命令如下java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin，输出如下：

Annotation pipeline timing information:
TokenizerAnnotator: 0.0 sec.
WordsToSentencesAnnotator: 0.0 sec.
TOTAL: 0.0 sec. for 8 tokens at 296.3 tokens/sec.
Pipeline setup: 0.0 sec.
Total time for StanfordCoreNLP pipeline: 8.7 sec.

C:\stanford-corenlp-full-2015-04-20>java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin
Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.4 sec].
Adding annotator sentiment
Reading in text from stdin.
Please enter one sentence per line.
Processing will end when EOF is reached.

Computer is fun. Not too fun.
  Positive
  Neutral

如何通过删除标点符号使输出成为类似于我在下面所做的单个标记：

Computer is fun Not too fun.
  Positive

似乎我应该能够轻松地做到这一点，因为有-ssplit.isOneSentence，据我了解，情绪标记器使用ssplit，但我不知道如何修改我的命令来合并它（我已阅读@ 987654321@).

【问题讨论】：

标签： java stanford-nlp

【解决方案1】：

SentimentPipeline 中似乎有一个错误，因为当您使用 -stdin 选项时，它不应该在一行内拆分句子。我现在解决了这个问题，但除非你编译自己的版本，否则在我们发布下一个版本的 CoreNLP 之前这对你没有帮助。

但还有一种替代方法（可能是更好的方法）使用 CoreNLP 管道获取句子的情感标签。

以下命令运行与您的命令相同的代码，但同时它允许您为各个注释器指定更多选项（包括-ssplit.eolonly 选项）。

java -cp "*" -mx5g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators "tokenize,ssplit,parse,sentiment" -ssplit.eolonly

【讨论】：

非常感谢。
我可以将 -ssplit.eolonly 与 -file 一起使用吗？
如果您运行edu.stanford.nlp.sentiment.SentimentPipeline，则无法使用此选项。但是您可以使用-file 参数运行StanfordCoreNLP 管道（请参阅上面的命令）。