用于生成基于 NLP 的文本注释器的性能报告的实用程序答案

【问题标题】：Utility to generate performance report of a NLP based text annotator用于生成基于 NLP 的文本注释器的性能报告的实用程序
【发布时间】：2013-10-25 19:55:02
【问题描述】：

我正在尝试为我的文本注释器构建质量测试框架。我使用 GATE 编写了我的注释器

我确实为每个输入文档提供了黄金标准（人工注释）数据。

这里是质量保证门资源列表GATE Embedded API for the measures

到目前为止，我能够使用以下方法获得包含 FP,TP,FN, Precision, Recall and Fscores 的性能矩阵 AnnotationDiﬀer

现在，我想深入研究。我想根据每个文档查看单个 FP、FN。即我想分析每个 FP 和 FN，以便我可以相应地修复我的注释器。

我没有在 GATE 的任何类中看到任何函数，例如 AnnotationDiffer，它返回 FP 或 FN 的 List<Annotation>。他们只返回 FP 和 FN 的计数

int fp=annotationDiffer.getFalsePositivesStrict()
int fn=annotationDiffer.getMissing()

在我继续创建自己的实用程序以获取 FP 和 FN 以及几个周围句子的 List<Annotation> 之前，为每个输入文档创建一个 HTML 报告以进行分析。我想检查是否已经存在类似的东西。

【问题讨论】：

标签： nlp stanford-nlp gate

【解决方案1】：

我想出了如何获取 FP 和 FN 注释

List<AnnotationDiffer.Pairing> differ= annotationDiffer.calculateDiff(goldAnnotSet, systemAnnotSet);


    for(Annotation fnAnnotation:annotationDiffer.missingAnnotations)
    {
       System.out.println("FN=>"+fnAnnotation);
    }


    for(Annotation fpAnnotation:annotationDiffer.spuriousAnnotations)
    {
       System.out.println("FP=>"+fpAnnotation);
    }

基于fnAnnotation或fpAnnotations的偏移量，我可以轻松获取周围的句子并创建一个漂亮的html报告。

【讨论】：