斯坦福 CoreNLP OpenIE 注释器答案

【问题标题】：Stanford CoreNLP OpenIE annotator斯坦福 CoreNLP OpenIE 注释器
【发布时间】：2016-05-22 13:43:55
【问题描述】：

我有一个关于斯坦福 CoreNLP OpenIE 注释器的问题。

我正在使用斯坦福 CoreNLP 版本 stanford-corenlp-full-2015-12-09 以便使用 OpenIE 提取关系。我不太了解 Java，这就是为什么我在 Python 3.4 中使用 pycorenlp 包装器。

我想提取句子中所有单词之间的关系，下面是我使用的代码。我也有兴趣展示每个三胞胎的信心：

import nltk
from pycorenlp import *
import collections
nlp=StanfordCoreNLP("http://localhost:9000/")
s="Twenty percent electric motors are pulled from an assembly line"
output = nlp.annotate(s, properties={"annotators":"tokenize,ssplit,pos,depparse,natlog,openie",
                                 "outputFormat": "json","triple.strict":"true"})
result = [output["sentences"][0]["openie"] for item in output]
print(result)
for i in result:
for rel in i:
    relationSent=rel['relation'],rel['subject'],rel['object']
    print(relationSent)

这是我得到的结果：

[[{'relationSpan': [4, 6], 'subject': 'Twenty percent electric motors', 'objectSpan': [8, 10], 'relation': 'are pulled from', 'object': 'assembly line', 'subjectSpan': [0, 4]}, {'relationSpan': [4, 6], 'subject': 'percent electric motors', 'objectSpan': [8, 10], 'relation': 'are pulled from', 'object': 'assembly line', 'subjectSpan': [1, 4]}, {'relationSpan': [4, 5], 'subject': 'Twenty percent electric motors', 'objectSpan': [5, 6], 'relation': 'are', 'object': 'pulled', 'subjectSpan': [0, 4]}, {'relationSpan': [4, 5], 'subject': 'percent electric motors', 'objectSpan': [5, 6], 'relation': 'are', 'object': 'pulled', 'subjectSpan': [1, 4]}]]

三胞胎是：

('are pulled from', 'Twenty percent electric motors', 'assembly line')
('are pulled from', 'percent electric motors', 'assembly line')
('are', 'Twenty percent electric motors', 'pulled')
('are', 'percent electric motors', 'pulled')

第一个问题是结果中没有显示信心。第二个问题是我只想检索包含句子所有单词的三元组，即这个三元组：

('are pulled from', 'Twenty percent electric motors', 'assembly line')

我得到的不仅仅是一个三胞胎组合。我尝试使用选项"triple.strict":"true"，因为它提取“仅当它们消耗整个片段时三元组”但它不起作用。

谁能给我建议？

【问题讨论】：

标签： python stanford-nlp

【解决方案1】：

你应该试试这个设置：

"openie.triple.strict":"true"

查看此时出现的代码，置信度并未与返回的 json 一起存储，因此您无法从 CoreNLP 服务器获取。

既然你提出这个问题，我会推送一个更改，将这些更改添加到输出 json 中，并在 GitHub 上实时通知你。

【讨论】：

openie.triple.strict = true 确保分段器了解其分段的片段的所有组件。我怀疑你会有更多的运气设置max_entailments_per_clause = 1 和splitter.disable = true。
StanfordNLPHelp 你能告诉我在 json 中从服务器返回的信心是否已修复？谢谢
我在 2019 年使用 coreNLP 3.9.2 尝试这个，但我没有看到信心分数。 @StanfordNLPHelp 这有实现过吗？

【解决方案2】：

非常感谢，现在它正在工作我添加了两个：“openie.triple.strict”：“true”和“openie.max_entailments_per_clause”：“1”现在的代码是：

output = nlp.annotate(chunkz, properties={"annotators":"tokenize,ssplit,pos,depparse,natlog,openie",
                                "outputFormat": "json",
                                 "openie.triple.strict":"true",
                                 "openie.max_entailments_per_clause":"1"})

【讨论】：

嘿，我认为您应该按照 Gabor Angeli 的建议添加 splitter.disable = true。另外，您现在如何获得信心分数？