使用 Weka 编写分类器答案

【问题标题】：Program a classifier with Weka使用 Weka 编写分类器
【发布时间】：2015-04-29 03:27:30
【问题描述】：

我的英语很糟糕，但我会尽量说清楚。我想用 Weka 编写一个分类器（例如 J48）。在我的例子中，一个实例由六个数字组成，所有数字都在 0 和 10 之间，除了一个介于 0 和 -10 之间的数字。

示例： 1,-3,6,3,6,7 或 1,-4,5,3,7,6 或 2,-4,5,3,8,6

在 ARFF 中：

@ATTRIBUTE 属性 1 {0,1,2,3,4,5,6,7,8,9,10}

@ATTRIBUTE 属性 2 {0,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10}

@ATTRIBUTE 属性 3 {0,1,2,3,4,5,6,7,8,9,10}

...

这些实例（示例）都是“好”的。我想知道是否可以创建分类器。我会给它一个新实例，它可以回答（用百分比）这个实例是否好。我问这个是因为我不知道如何选择类索引或结果变量...

【问题讨论】：

标签： tree classification weka

【解决方案1】：

我正在使用 Weka 概述非常基本的操作方法分类。

培训文件 您需要一个培训文件。 Weka 将许多不同的格式视为训练文件（以及测试文件）。其中有 ARFF（属性关系文件格式）和 CSV（逗号分隔值）格式。假设我们有一个 ARFF 格式的训练文件。文件的一部分如下所示：

@relation pima_diabetes
@attribute 'preg' real
@attribute 'plas' real
@attribute 'pres' real
@attribute 'skin' real
@attribute 'insu' real
@attribute 'mass' real
@attribute 'pedi' real
@attribute 'age' real
@attribute 'class' { tested_negative, tested_positive}
@data
6,148,72,35,0,33.6,0.627,50,tested_positive
1,85,66,29,0,26.6,0.351,31,tested_negative

请注意，要培养出优秀的学习者，您需要拥有大量的训练数据。同样，您的所有类都应该在您的训练数据中得到很好的表现，以便您要从中开发的分类器具有区分这些类的能力。

测试文件 如上所述，测试文件也可以有许多不同的形式。比如说，我们的测试文件是 ARFF 格式的，我们的测试文件的一部分如下：

@attribute 'preg' real
@attribute 'plas' real
@attribute 'pres' real
@attribute 'skin' real
@attribute 'insu' real
@attribute 'mass' real
@attribute 'pedi' real
@attribute 'age' real
@attribute 'class' { tested_negative, tested_positive}
@data
5,116,74,0,0,25.6,0.201,30,?
3,78,50,32,88,31,0.248,26,?

请注意，测试数据的类标签带有“？”标签，因为标签是未知的，由您从训练数据中开发的分类器确定。

守则使用 Java API，一种简单的方法来设置我们的分类器并在训练数据上构建它，最后将其应用于对未知的、未标记的测试实例进行分类，如下所示：

/**
     * Method to build the naive bayes classifier and classify test documents
     */
    public void classify(){
        //setting the classifier--->
        fc = new FilteredClassifier();
        nb = new NaiveBayes();      
        fc.setFilter(filter);
        fc.setClassifier(nb);
        //<---setting of the classifier ends
        //building the classifier--->

        try {
            fc.buildClassifier(data);
        } catch (Exception e) {
            System.out.println("Error from Classification.classify(). Cannot build classifier");
        }
        //<---building of the classifier ends
        //Classification--->
        clsLabel = new double[testData.numInstances()]; //holds class label of the test documents
        //for each test document--->
        for (int i = 0; i < testData.numInstances(); i ++){
            try {
                clsLabel[i] = fc.classifyInstance(testData.instance(i));
            } catch (Exception e) {
                System.out.println("Error from Classification.classify(). Cannot classify instance");
            }
            testData.instance(i).setClassValue(clsLabel[i]);
        }//end for
        //<---classification ends
    }//end method

这就是您使用 Weka 对测试实例进行分类的方式！

【讨论】：

感谢您的回答 Rushdi，但我仍然缺少一个重要的细节。我解释一下：我可以知道正例，我可以保存它们来训练我的分类器。我的问题是我不知道负面情况。我不知道我是否清楚。我想仅在使用积极实例列表时将其分类为积极实例或不是新实例。
你需要教它负片的特征。否则它会将任何东西归类为正面。
所以不可能说：所有其他特征（或混合）都是负面的。对吗？
据我所知，不，我认为这不可能。