【问题标题】:Mallet NaiveBayes Classifier in Java null pointerJava空指针中的Mallet NaiveBayes分类器
【发布时间】:2019-07-30 16:14:16
【问题描述】:

我正在尝试实例化一个朴素贝叶斯分类器来对文本块进行分类(使用预定义的分类)。下面的示例只是尝试对男性/女性进行此操作。我尝试从文件(CSVloader)加载数据并在下面创建实例。问题是 trainer.train() 方法抛出空指针异常。似乎是因为 targetDictionary 为空。数据字典已填充。如何强制在实例上填充 targetDictionary?

我的实际目标是将数据库中的论文摘要分类为“科学、政治、法律、健康等”。看来贝叶斯分类器是正确的选择。

我已经迭代了加载的 instanceList,它似乎被正确填充,并且 dataDictionary 被填充,但 TargetDictionary 为空。

在 Windows 上使用 Mallet 2.0.8

public TestMallet() throws IOException {

ArrayList<Pipe> pipelist = new ArrayList<Pipe>();

    pipelist.add (new CharSequenceLowercase() ) ;
    pipelist.add (new CharSequence2TokenSequence(Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")) ) ;

    pipelist.add (new TokenSequenceRemoveStopwords (new File ("c:\\test\\config\\stopwords_en.txt"), "UTF-8", false, false, false) ) ;
    pipelist.add (new TokenSequence2FeatureSequence()) ;
    pipelist.add (new FeatureSequence2FeatureVector()) ; // Added but doesnt make any difference

    InstanceList instances = new InstanceList (new SerialPipes(pipelist)) ;

    Instance instance0 = new Instance("Hello World I am here and i am male my name is roger",   "Male",   "roger", "test") ;
    Instance instance1 = new Instance("Hello World I am here and i am male my name is phil",    "Male",   "phil",  "test") ;
    Instance instance2 = new Instance("Hello World I am here and i am male my name is joe",     "Male",   "joe",   "test") ;
    Instance instance3 = new Instance("Hello World I am here and i am female my name is vira",  "Female", "vira",  "test") ;
    Instance instance4 = new Instance("Hello World I am here and i am female my name is josie", "Female", "josie", "test") ;

    instances.addThruPipe (instance0) ;
    instances.addThruPipe (instance1) ;
    instances.addThruPipe (instance2) ;
    instances.addThruPipe (instance3) ;
    instances.addThruPipe (instance4) ;

    // Using Instance List to train
    // ----------------------------

    ClassifierTrainer trainer = new NaiveBayesTrainer();
    trainer.train(instances); 

// Null pointer exception here ( debugging, it looks like TargetDictionary is null) 

}

期待教练正确分析。

【问题讨论】:

    标签: mallet


    【解决方案1】:

    分类器学习根据输入特征预测输出。在这两种情况下,我们通常都需要将字符串转换为数字表示。您是在告诉 Mallet 如何对输入特征进行这种转换,而不是输出标签。

    添加Target2Label() 管道应该可以做到这一点,请参阅the Csv2Vectors class 示例。

    【讨论】:

      猜你喜欢
      • 2016-12-06
      • 1970-01-01
      • 1970-01-01
      • 2012-02-05
      • 1970-01-01
      • 2021-11-08
      • 2017-01-14
      • 2016-05-09
      • 2018-05-10
      相关资源
      最近更新 更多