【发布时间】:2019-07-30 16:14:16
【问题描述】:
我正在尝试实例化一个朴素贝叶斯分类器来对文本块进行分类(使用预定义的分类)。下面的示例只是尝试对男性/女性进行此操作。我尝试从文件(CSVloader)加载数据并在下面创建实例。问题是 trainer.train() 方法抛出空指针异常。似乎是因为 targetDictionary 为空。数据字典已填充。如何强制在实例上填充 targetDictionary?
我的实际目标是将数据库中的论文摘要分类为“科学、政治、法律、健康等”。看来贝叶斯分类器是正确的选择。
我已经迭代了加载的 instanceList,它似乎被正确填充,并且 dataDictionary 被填充,但 TargetDictionary 为空。
在 Windows 上使用 Mallet 2.0.8
public TestMallet() throws IOException {
ArrayList<Pipe> pipelist = new ArrayList<Pipe>();
pipelist.add (new CharSequenceLowercase() ) ;
pipelist.add (new CharSequence2TokenSequence(Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")) ) ;
pipelist.add (new TokenSequenceRemoveStopwords (new File ("c:\\test\\config\\stopwords_en.txt"), "UTF-8", false, false, false) ) ;
pipelist.add (new TokenSequence2FeatureSequence()) ;
pipelist.add (new FeatureSequence2FeatureVector()) ; // Added but doesnt make any difference
InstanceList instances = new InstanceList (new SerialPipes(pipelist)) ;
Instance instance0 = new Instance("Hello World I am here and i am male my name is roger", "Male", "roger", "test") ;
Instance instance1 = new Instance("Hello World I am here and i am male my name is phil", "Male", "phil", "test") ;
Instance instance2 = new Instance("Hello World I am here and i am male my name is joe", "Male", "joe", "test") ;
Instance instance3 = new Instance("Hello World I am here and i am female my name is vira", "Female", "vira", "test") ;
Instance instance4 = new Instance("Hello World I am here and i am female my name is josie", "Female", "josie", "test") ;
instances.addThruPipe (instance0) ;
instances.addThruPipe (instance1) ;
instances.addThruPipe (instance2) ;
instances.addThruPipe (instance3) ;
instances.addThruPipe (instance4) ;
// Using Instance List to train
// ----------------------------
ClassifierTrainer trainer = new NaiveBayesTrainer();
trainer.train(instances);
// Null pointer exception here ( debugging, it looks like TargetDictionary is null)
}
期待教练正确分析。
【问题讨论】:
标签: mallet