【问题标题】:Add custom jape file in GATE source code在 GATE 源代码中添加自定义 jape 文件
【发布时间】:2013-02-23 14:53:09
【问题描述】:

谁能指导我如何创建一个自定义 JAPE 文件并使用 GATE 源代码配置它。我尝试使用以下代码并得到异常,例如 “解析语法时出错:”和“未设置grammarURL 或 binaryGrammarURL 参数!”

     try{
             Document doc = new DocumentImpl();
              String str = "This is test.";
              DocumentContentImpl impl = new DocumentContentImpl(str);
              doc.setContent(impl);
          System.setProperty("gate.home", "C:\\Program Files\\GATE_Developer_7.1"); 
          Gate.init();
          gate.Corpus corpus = (Corpus) Factory
            .createResource("gate.corpora.CorpusImpl");
          File gateHome = Gate.getGateHome();
          File pluginsHome = new File(gateHome, "plugins");
          Gate.getCreoleRegister().registerDirectories(new File(pluginsHome, "ANNIE").toURI().toURL());  

          Transducer transducer = new Transducer();
             transducer.setDocument(doc);
transducer.setGrammarURL(new URL("file:///D:/misc_workspace/gate-7.1-build4485-SRC/plugins/ANNIE/resources/NE/SportsCategory.jape"));
transducer.setBinaryGrammarURL(new URL("file:///D:/misc_workspace/gate-7.1-build4485-SRC/plugins/ANNIE/resources/NE/SportsCategory.jape"));

LanguageAnalyser jape = (LanguageAnalyser)Factory.createResource(
                  "gate.creole.Transducer", gate.Utils.featureMap(
                          "grammarURL", "D:/misc_workspace/gate-7.1-build4485-SRC/plugins/ANNIE/resources/NE/SportsCategory.jape",
                          "encoding", "UTF-8"));

【问题讨论】:

    标签: bigdata gate named-entity-extraction


    【解决方案1】:

    您需要加载 ANNIE 插件

    Gate.getCreoleRegister().registerDirectories(
      new File(Gate.getPluginsHome(), "ANNIE").toURI().toURL());
    

    然后使用正确的参数创建gate.creole.Transducer 的实例

    LanguageAnalyser jape = (LanguageAnalyser)Factory.createResource(
      "gate.creole.Transducer", gate.Utils.featureMap(
          "grammarURL", new URL("file:///D:/path/to/my-grammar.jape"),
          "encoding", "UTF-8")); // ensure this matches the file
    

    但我们通常提倡的方法是在 GATE Developer 中按照您想要的方式组装和配置整个管道,使用您需要的任何标准组件以及您自己的语法,然后将应用程序状态保存到文件中。然后,您可以使用一行代码从代码中重新加载整个应用程序

    CorpusController app = (CorpusController) PersistenceManager.loadObjectFromFile(savedAppFile);
    

    编辑:您添加到问题中的代码有几个基本问​​题。首先,您必须先调用Gate.init(),然后再对GATE 执行任何其他操作 - 它必须是您创建Document 之前。其次,您必须never call the constructor of a Resource class directly - 始终使用Factory。同样,您永远不需要直接致电init(),因为这是作为Factory.createResource 的一部分为您完成的。例如:

    // initialise GATE
    Gate.setGateHome(new File("C:\\Program Files\\GATE_Developer_7.1"));
    Gate.init();
    
    // load ANNIE plugin - you must do this before you can create tokeniser
    // or JAPE transducer resources.
    Gate.getCreoleRegister().registerDirectories(
       new File(Gate.getPluginsHome(), "ANNIE").toURI().toURL());
    
    // Build the pipeline
    SerialAnalyserController pipeline =
      (SerialAnalyserController)Factory.createResource(
         "gate.creole.SerialAnalyserController");
    LanguageAnalyser tokeniser = (LanguageAnalyser)Factory.createResource(
         "gate.creole.tokeniser.DefaultTokeniser");
    LanguageAnalyser jape = (LanguageAnalyser)Factory.createResource(
      "gate.creole.Transducer", gate.Utils.featureMap(
          "grammarURL", new File("D:\\path\\to\\my-grammar.jape").toURI().toURL(),
          "encoding", "UTF-8")); // ensure this matches the file
    pipeline.add(tokeniser);
    pipeline.add(jape);
    
    // create document and corpus
    Corpus corpus = Factory.newCorpus(null);
    Document doc = Factory.newDocument("This is test.");
    corpus.add(doc);
    pipeline.setCorpus(corpus);
    
    // run it
    pipeline.execute();
    
    // extract results
    System.out.println("Found annotations of the following types: " +
      doc.getAnnotations().getAllTypes());
    

    如果您还没有,我强烈建议您至少完成模块 5 的 training course materials,它将向您展示加载文档并在其上运行处理资源的正确方法。

    【讨论】:

    • 非常感谢伊恩。您能否分享创建 Transducer 实例的代码。我收到“解析语法时出错:”之类的异常,以及“没有设置 GrammarURL 或 binaryGrammarURL 参数!”之类的嵌套原因!
    • @abhijitnag 那是(LanguageAnalyser)Factory.createResourcebit。如果您已经尝试过一些不起作用的东西,您应该编辑您的问题并包含不起作用的代码,以便我们提出修复建议。
    • 伊恩,我已经用我尝试过的代码更新了这个问题。你能建议一下吗?
    【解决方案2】:

    谢谢伊恩。这些培训课程材料很有帮助。但我的问题不同,我已经解决了。以下代码快照是如何使用GATE 中的自定义jape 文件。现在我的自定义 jape 文件能够生成新注释

     System.setProperty("gate.home", "C:\\Program Files\\GATE_Developer_7.1"); 
      Gate.init();
    
      ProcessingResource token = (ProcessingResource)   Factory.createResource("gate.creole.tokeniser.DefaultTokeniser",Factory.newFeatureMap());
    
    
    
     String str = "This is a test. Myself Abhijit Nag sport";
       Document doc = Factory.newDocument(str);
    
    
      gate.Corpus corpus = (Corpus) Factory.createResource("gate.corpora.CorpusImpl");
      corpus.add(doc);
      File gateHome = Gate.getGateHome();
      File pluginsHome = new File(gateHome, "plugins");
    
      Gate.getCreoleRegister().registerDirectories(new File(pluginsHome, "ANNIE").toURI().toURL());  
    
    
     LanguageAnalyser jape = (LanguageAnalyser)Factory.createResource(
                  "gate.creole.Transducer", gate.Utils.featureMap(
                          "grammarURL", "file:///D:/misc_workspace/gate-7.1-build4485-SRC/plugins/ANNIE/resources/NE/SportsCategory.jape","encoding", "UTF-8"));
          jape.setCorpus(corpus);
          jape.setDocument(doc);
          jape.execute();
    
      pipeline = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController",
                    Factory.newFeatureMap(), Factory.newFeatureMap(),"ANNIE");
                  initAnnie();
                  pipeline.setCorpus(corpus);
                  pipeline.add(token);
                  pipeline.add((ProcessingResource)jape.init());
                  pipeline.execute();
          AnnotationSetImpl ann = (AnnotationSetImpl) doc.getAnnotations();
          System.out.println(" ...Total annotation "+ann.getAllTypes());
    

    【讨论】:

    • 您仍在使用new DocumentImpl()doc.setContent 等,应全部替换为Document doc = Factory.newDocument("This is a test ...."); - DocumentImpl 是一种资源类型,并且“不要使用new,使用Factory" 规则适用。
    【解决方案3】:

    如果您想更新 ANNIE 管道,这是另一种选择。

    1. 首先获取管道中默认/现有处理资源的列表
    2. 创建 JAPE 规则的实例
    3. 遍历现有处理资源的列表,将每个资源添加到新集合中。将您自己的自定义 JAPE 规则添加到此集合中。
    4. 当您执行 ANNIE 管道时,您的 JAPE 规则将被自动拾取,因此无需指定文档路径或单独执行。

    示例代码:

    File pluginsHome = Gate.getPluginsHome();
    File anniePlugin = new File(pluginsHome, "ANNIE");
    File annieGapp = new File(anniePlugin, "ANNIE_with_defaults.gapp");
    annieController = (CorpusController) PersistenceManager.loadObjectFromFile(annieGapp);
    
    LanguageAnalyser jape = (LanguageAnalyser)Factory.createResource(
                    "gate.creole.Transducer", gate.Utils.featureMap(
                            "grammarURL", new URL("file:///C://Program Files//gate-7.1//plugins//ANNIE//resources//NE//opensource.jape"),
                            "encoding", "UTF-8")); 
    
    Collection<ProcessingResource> newPRS = new ArrayList<ProcessingResource>();
    Collection<ProcessingResource> prs = annieController.getPRs();
    for(ProcessingResource resource: prs){
        newPRS.add(resource);
    }
    newPRS.add((ProcessingResource)jape.init());
    annieController.setPRs(newPRS);
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2023-01-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-04-30
      相关资源
      最近更新 更多