【问题标题】:String IN, String OUT?字符串输入,字符串输出?
【发布时间】:2015-07-13 17:28:23
【问题描述】:

我是 ClearTKUIMA 的新手。到目前为止,我找不到任何关于如何创建不涉及文件的管道的示例。

我正在尝试使用 cleartk 和 UIMA 处理存储在 Java 字符串变量中的小文本,并返回一个 XML 字符串(ClearTK TimeML 注释器的结果)。

我能够提供一个字符串作为输入(请参阅代码摘录),但代码远非优雅(需要执行 set 和 CAS 的空 URI。)此外,输出正在保存到文件中,但是我想取回一个字符串(将输出保存到文件然后将文件读回内存是没有意义的)。

import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.fit.factory.AnalysisEngineFactory;
import org.apache.uima.fit.pipeline.SimplePipeline;
import org.apache.uima.jcas.JCas;
import org.cleartk.corpus.timeml.TempEval2007Writer;
import org.cleartk.opennlp.tools.PosTaggerAnnotator;
import org.cleartk.snowball.DefaultSnowballStemmer;
import org.cleartk.timeml.event.*;
import org.cleartk.timeml.time.TimeTypeAnnotator;
import org.cleartk.timeml.tlink.TemporalLinkEventToDocumentCreationTimeAnnotator;
import org.cleartk.timeml.tlink.TemporalLinkEventToSameSentenceTimeAnnotator;
import org.cleartk.timeml.tlink.TemporalLinkEventToSubordinatedEventAnnotator;
import org.cleartk.timeml.type.DocumentCreationTime;
import org.cleartk.token.tokenizer.TokenAnnotator;
import org.cleartk.util.cr.FilesCollectionReader;

...

String documentText = "First make sure that you are using eggs that are several days old...";
JCas sourceCas = createJCas();

sourceCas.setDocumentText(documentText);
ViewUriUtil.setURI(sourceCas, new URI(""));

SimplePipeline.runPipeline(
        sourceCas,
        org.cleartk.opennlp.tools.SentenceAnnotator.getDescription(),
        TokenAnnotator.getDescription(),
        PosTaggerAnnotator.getDescription(),
        DefaultSnowballStemmer.getDescription("English"),
        org.cleartk.opennlp.tools.ParserAnnotator.getDescription(),
        org.cleartk.timeml.time.TimeAnnotator.FACTORY.getAnnotatorDescription(),
        TimeTypeAnnotator.FACTORY.getAnnotatorDescription(),
        EventAnnotator.FACTORY.getAnnotatorDescription(),
        EventTenseAnnotator.FACTORY.getAnnotatorDescription(),
        EventAspectAnnotator.FACTORY.getAnnotatorDescription(),
        EventClassAnnotator.FACTORY.getAnnotatorDescription(),
        EventPolarityAnnotator.FACTORY.getAnnotatorDescription(),
        EventModalityAnnotator.FACTORY.getAnnotatorDescription(),
        AnalysisEngineFactory.createEngineDescription(AddEmptyDCT.class),
        TemporalLinkEventToDocumentCreationTimeAnnotator.FACTORY.getAnnotatorDescription(),
        TemporalLinkEventToSameSentenceTimeAnnotator.FACTORY.getAnnotatorDescription(),
        TemporalLinkEventToSubordinatedEventAnnotator.FACTORY.getAnnotatorDescription(),
        TempEval2007Writer.getDescription("file:///tmp/out.tml"));

让管道将一个字符串作为输入并产生另一个字符串作为执行结果的推荐方法是什么?

【问题讨论】:

    标签: java uima cleartk


    【解决方案1】:

    像您一样使用SimplePipeline 运行引擎,然后从您的sourceCas 中检索您感兴趣的注释,如下所示:

    Collection<MyAnnotation> myAnnotation = JCasUtil.select(sourceCas, MyAnnotation.class);
    String myproperty = myAnnotation.getMyproperty();
    

    【讨论】:

      【解决方案2】:

      我最喜欢的方法是不使用管道,而是通过 org.apache.uima.fit.factory.AggregateBuilder 手动创建分析引擎,正如 Lee Becker 在 post 上所推荐的那样。

          AggregateBuilder builder = new AggregateBuilder();
          builder.add(org.cleartk.opennlp.tools.SentenceAnnotator.getDescription());
          builder.add(TokenAnnotator.getDescription());
          builder.add(DefaultSnowballStemmer.getDescription("English"));
          builder.add(org.cleartk.opennlp.tools.ParserAnnotator.getDescription());
          builder.add(org.cleartk.timeml.time.TimeAnnotator.FACTORY.getAnnotatorDescription());
          builder.add(TimeTypeAnnotator.FACTORY.getAnnotatorDescription());
          builder.add(EventAnnotator.FACTORY.getAnnotatorDescription());
          builder.add(EventTenseAnnotator.FACTORY.getAnnotatorDescription());
          builder.add(EventAspectAnnotator.FACTORY.getAnnotatorDescription());
          builder.add(EventClassAnnotator.FACTORY.getAnnotatorDescription());
          builder.add(EventPolarityAnnotator.FACTORY.getAnnotatorDescription());
          builder.add(EventModalityAnnotator.FACTORY.getAnnotatorDescription());
          builder.add(AnalysisEngineFactory.createEngineDescription(AddEmptyDCT.class));
          builder.add(TemporalLinkEventToDocumentCreationTimeAnnotator.FACTORY.getAnnotatorDescription());
          builder.add(TemporalLinkEventToSameSentenceTimeAnnotator.FACTORY.getAnnotatorDescription());
          builder.add(TemporalLinkEventToSubordinatedEventAnnotator.FACTORY.getAnnotatorDescription());
      
          AnalysisEngine aggregateEngine = builder.createAggregate();
          JCas sourceCas = createJCas();
          sourceCas.setDocumentText(documentText);
          ViewUriUtil.setURI(sourceCas, new URI(""));
      
          aggregateEngine.process(sourceCas);
      
          String timeMlXml = TempEval2007Writer.toTimeML(sourceCas);
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-08-21
        • 1970-01-01
        • 1970-01-01
        • 2021-01-25
        相关资源
        最近更新 更多