【问题标题】:Training Named Entity in OpenNLP在 OpenNLP 中训练命名实体
【发布时间】:2015-10-16 15:10:22
【问题描述】:

我想为印度人名训练一个语料库:

class NameTraining
{
    public static void TrainNames() throws IOException 
    {
        Charset charset = Charset.forName("UTF-8");         
        FileReader fileReader = new FileReader("train.txt");
        ObjectStream fileStream = new PlainTextByLineStream(fileReader);
        ObjectStream sampleStream = new NameSampleDataStream(fileStream);
        TokenNameFinderModel model = NameFinderME.train("pt-br", "train", sampleStream, Collections.<String, Object>emptyMap());
        NameFinderME nfm = new NameFinderME(model); 
    }

    public static void main(String args[]) throws IOException
    {
        NameTraining det = new NameTraining();
        det.TrainNames();
    }
}

我使用以下命令编译它:

javac -cp $(echo lib/*.jar | tr ' ' ':') NameTraining.java -Xlint:unchecked

但是我收到这些错误消息

NameTraining.java:35: warning: [unchecked] unchecked conversion
found   : opennlp.tools.util.ObjectStream
required: opennlp.tools.util.ObjectStream<java.lang.String>
        ObjectStream sampleStream = new NameSampleDataStream(fileStream);
                                                             ^
NameTraining.java:36: warning: [unchecked] unchecked conversion
found   : opennlp.tools.util.ObjectStream
required: opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>
        TokenNameFinderModel model = NameFinderME.train("pt-br", "train", sampleStream, Collections.<String, Object>emptyMap());
                                                                          ^
2 warnings

我想知道两件事

  1. 上面的代码训练是否正确,如果正确,训练后如何查看结果?
  2. 警告是什么意思?

【问题讨论】:

    标签: java bash opennlp named-entity-recognition


    【解决方案1】:

    您好,我得到了一个简短的成功训练数据集

    public static void TrainNames() throws IOException
        {
            Charset charset = Charset.forName("UTF-8");
            ObjectStream<String> lineStream =new PlainTextByLineStream(new FileInputStream("/home/yogi.singh/dev/java/nlp/data/en-ner-person.train"), charset);
            ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);       
            //FileReader fileReader = new FileReader("train.txt");
            //ObjectStream fileStream = new PlainTextByLineStream(fileReader);
            //ObjectStream sampleStream = new NameSampleDataStream(fileStream);
            TokenNameFinderModel model = NameFinderME.train("en", "person", sampleStream, Collections.<String, Object>emptyMap());
            NameFinderME nfm = new NameFinderME(model);
            String sentence = "";
    
    
            BufferedReader br = new BufferedReader(new FileReader("/home/yogi.singh/dev/java/nlp/train.txt"));
            try
             {
                StringBuilder sb = new StringBuilder();
                String line = br.readLine();
    
                while (line != null)
                {
                    sb.append(line);
                    sb.append('\n');
                    line = br.readLine();
                }
                sentence = sb.toString();
             } 
            finally
            {
                br.close();
            }
    
            InputStream is1 = new FileInputStream("/home/yogi.singh/dev/java/nlp/data/en-token.bin");
            TokenizerModel model1 = new TokenizerModel(is1);
    
            Tokenizer tokenizer = new TokenizerME(model1);
    
            String tokens[] = tokenizer.tokenize(sentence);
    
            for (String a : tokens)
                System.out.println(a);
    
            Span nameSpans[] = nfm.find(tokens);
            for(Span s: nameSpans)
            {
                System.out.print(s.toString());
                System.out.print(" ");
                for(int index = s.getStart();index < s.getEnd();index++)
                {
                    System.out.print(tokens[index] + " ");
                }
                System.out.println(" ");
            }
        }
    

    【讨论】:

    • 一个问题,en-ner-person.train 是哪种类型的文件?你是从 en-ner-person.bin 复制的吗? train.txt 包含了 train? 的新词,我是第一次,谢谢
    • 如何用新数据重新训练现有的 en-ner-person.bin ?
    【解决方案2】:

    这些警告与使用 Java generics 而不是 OpenNLP 有关。

    试试这个:

    ObjectStream<String> fileStream = new PlainTextByLineStream(fileReader);
    ObjectStream<NameSample> sampleStream = new NameSampleDataStream(fileStream);
    

    【讨论】:

      猜你喜欢
      • 2011-10-20
      • 1970-01-01
      • 1970-01-01
      • 2015-06-27
      • 2016-10-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多