【问题标题】:How to use class Imbalance technique (SMOTE) with Java Weka API?如何在 Java Weka API 中使用类不平衡技术 (SMOTE)?
【发布时间】:2019-04-10 10:43:21
【问题描述】:

我正在尝试使用 Java Weka API 构建分类模型。我的训练数据集存在类不平衡问题。出于这个原因,我想使用像 SMOTE 这样的类不平衡技术来减少类不平衡问题。

源码如下:

package classification;
import java.util.Random;
import weka.classifiers.Classifier;
import weka.classifiers.bayes.NaiveBayesMultinomial;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.StringToWordVector;
public class questStackoverflow {
public static void main(String agrs[]) throws Exception{
String fileRootPath = "../file.arff"; //Dataset
    Instances strdata = DataSource.read(fileRootPath); //Load Dataset
    StringToWordVector filter = new StringToWordVector(10000);
    filter.setInputFormat(strdata);
    String[] options = { "-W", "10000", "-L", "-M", "1",
            "-stemmer", "weka.core.stemmers.IteratedLovinsStemmer", 
            "-stopwords-handler", "weka.core.stopwords.Rainbow", 
            "-tokenizer", "weka.core.tokenizers.AlphabeticTokenizer" 
            };
    filter.setOptions(options);
    filter.setIDFTransform(true);
    Instances data = Filter.useFilter(strdata,filter); //Apply filter
    data.setClassIndex(0); //set class index        
    double recall=0.0;
    double precision=0.0;
    double fmeasure=0.0;
    double tp, fp, fn, tn;

    Classifier classifier = null;
    classifier = new NaiveBayesMultinomial(); //classifer

    int folds = 10;         
    Random random = new Random(1);
    data.randomize(random);
    data.stratify(folds);
    tp = fp = fn = tn = 0;
    for (int i = 0; i < folds; i++) {
       Instances trains = data.trainCV(folds, i,random); //training dataset
       Instances tests = data.testCV(folds, i); //testing dataset
        classifier.buildClassifier(trains);    //build classifier           
        for (int j = 0; j < tests.numInstances(); j++) {    
           Instance instance = tests.instance(j);
           double classValue = instance.classValue();                   
           double result = classifier.classifyInstance(instance);
            if (result == 0.0 && classValue == 0.0) {
                    tp++;
                } else if (result == 0.0 && classValue == 1.0) {
                    fp++;
                } else if (result == 1.0 && classValue == 0.0) {
                    fn++;
                } else if (result == 1.0 && classValue == 1.0) {
                    tn++;
                }
            }   
        }

        if (tn + fn > 0)
            precision = tn / (tn + fn);
        if (tn + fp > 0)
            recall = tn / (tn + fp);
        if (precision + recall > 0)
            fmeasure = 2 * precision * recall / (precision + recall);
        System.out.println("Precision: " + precision);
        System.out.println("Recall: " + recall);
        System.out.println("Fmeasure: " + fmeasure);

    }

}

我的代码在没有类不平衡技术的情况下运行良好。但是,我需要使用类不平衡技术来缓解类不平衡问题。但是,我不知道如何在 Java Weka API 中使用它。

【问题讨论】:

    标签: java weka text-classification


    【解决方案1】:

    您可以在代码中添加以下代码行:

    weka.filters.supervised.instance.SMOTE
    
    
    SMOTE smote=new SMOTE();
    smote.setInputFormat(trains);       
    Instances Trains_smote= Filter.useFilter(trains, smote);
    

    您的代码将如下所示。

    package classification;
    import java.util.Random;
    import weka.classifiers.Classifier;
    import weka.classifiers.bayes.NaiveBayesMultinomial;
    import weka.core.Instance;
    import weka.core.Instances;
    import weka.core.converters.ConverterUtils.DataSource;
    import weka.filters.Filter;
    import weka.filters.unsupervised.attribute.StringToWordVector;
    weka.filters.supervised.instance.SMOTE
    public class questStackoverflow {
    public static void main(String agrs[]) throws Exception{
    String fileRootPath = "../file.arff"; //Dataset
    Instances strdata = DataSource.read(fileRootPath); //Load Dataset
    StringToWordVector filter = new StringToWordVector(10000);
    filter.setInputFormat(strdata);
    String[] options = { "-W", "10000", "-L", "-M", "1",
            "-stemmer", "weka.core.stemmers.IteratedLovinsStemmer", 
            "-stopwords-handler", "weka.core.stopwords.Rainbow", 
            "-tokenizer", "weka.core.tokenizers.AlphabeticTokenizer" 
            };
    filter.setOptions(options);
    filter.setIDFTransform(true);
    Instances data = Filter.useFilter(strdata,filter); //Apply filter
    data.setClassIndex(0); //set class index        
    double recall=0.0;
    double precision=0.0;
    double fmeasure=0.0;
    double tp, fp, fn, tn;
    
    Classifier classifier = null;
    classifier = new NaiveBayesMultinomial(); //classifer
    
    int folds = 10;         
    Random random = new Random(1);
    data.randomize(random);
    data.stratify(folds);
    tp = fp = fn = tn = 0;
    for (int i = 0; i < folds; i++) {
       Instances trains = data.trainCV(folds, i,random); //training dataset
       Instances tests = data.testCV(folds, i); //testing dataset
       SMOTE smote=new SMOTE();
       smote.setInputFormat(trains);        
       Instances Trains_smote = Filter.useFilter(trains, smote);
    
        classifier.buildClassifier(Trains_smote);    //build classifier           
        for (int j = 0; j < tests.numInstances(); j++) {    
           Instance instance = tests.instance(j);
           double classValue = instance.classValue();                   
           double result = classifier.classifyInstance(instance);
            if (result == 0.0 && classValue == 0.0) {
                    tp++;
                } else if (result == 0.0 && classValue == 1.0) {
                    fp++;
                } else if (result == 1.0 && classValue == 0.0) {
                    fn++;
                } else if (result == 1.0 && classValue == 1.0) {
                    tn++;
                }
            }   
        }
    
        if (tn + fn > 0)
            precision = tn / (tn + fn);
        if (tn + fp > 0)
            recall = tn / (tn + fp);
        if (precision + recall > 0)
            fmeasure = 2 * precision * recall / (precision + recall);
        System.out.println("Precision: " + precision);
        System.out.println("Recall: " + recall);
        System.out.println("Fmeasure: " + fmeasure);
    
    }
    

    }

    【讨论】:

    • 任何人都可以建议我哪种类不平衡技术最能缓解类不平衡问题。
    猜你喜欢
    • 2018-01-14
    • 2016-08-07
    • 1970-01-01
    • 2014-05-03
    • 1970-01-01
    • 1970-01-01
    • 2017-03-17
    • 2017-10-01
    • 2021-06-28
    相关资源
    最近更新 更多