如何在 weka 中使用 Isolationforest？答案

【问题标题】：How to use Isolationforest in weka?如何在 weka 中使用 Isolationforest？
【发布时间】：2021-02-18 00:10:53
【问题描述】：

我正在尝试在 weka 中使用isolationforest，但我找不到一个简单的示例来说明如何使用它，谁能帮助我？提前谢谢

import weka.classifiers.misc.IsolationForest;

public class Test2 {
    public static void main(String[] args) {
        IsolationForest isolationForest = new IsolationForest();
        .....................................................
    }
}

【问题讨论】：

标签： weka

【解决方案1】：

我强烈建议您研究一下 IslationForest 的实现。下面的代码加载一个带有 Class 的第一列的 CSV 文件（注意：如果它是二进制的，单个类值只会产生（1-异常分数），你也会得到异常分数。否则它只会返回一个错误）。注意我跳过了第二列（在我的例子中是异常检测不需要的 uuid）

 private static void findOutlier(File in, File out) throws Exception {
    CSVLoader loader = new CSVLoader();
    loader.setSource(new File(in.getAbsolutePath()));

    Instances data = loader.getDataSet();
    // setting class attribute if the data format does not provide this information
    // For example, the XRFF format saves the class attribute information as well
    if (data.classIndex() == -1)
        data.setClassIndex(0);

    String[] options = new String[2];
    options[0] = "-R";                                    // "range"
    options[1] = "2";                                     // first attribute
    Remove remove = new Remove();                         // new instance of filter
    remove.setOptions(options);                           // set options
    remove.setInputFormat(data);                          // inform filter about dataset **AFTER** setting options
    Instances newData = Filter.useFilter(data, remove);   // apply filter

    IsolationForest randomForest = new IsolationForest();
    randomForest.buildClassifier(newData);
   // System.out.println(randomForest);

    FileWriter fw = new FileWriter(out);
    final Enumeration<Attribute> attributeEnumeration = data.enumerateAttributes();
    for (Attribute e = attributeEnumeration.nextElement(); attributeEnumeration.hasMoreElements(); e = attributeEnumeration.nextElement()) {
        fw.write(e.name());
        fw.write(",");
    }
    fw.write("(1 - anomaly score),anomaly score\n");
    for (int i = 0; i < data.size(); ++i) {
        Instance inst = data.get(i);
        final double[] distributionForInstance = randomForest.distributionForInstance(inst);
        fw.write(inst + ", " + distributionForInstance[0] + "," + (1 - distributionForInstance[0]));
        fw.write(",\n");
    }
    fw.flush();
}

上一个函数将在 CSV 的最后一列添加异常值。请注意，我使用的是单个类，因此为了获得相应的异常，我会做 1 - distributionForInstance[0] 否则你可以简单地做 distributionForInstance[1] 。

用于获取（1-异常分数）的示例 input.csv：

Class,ignore, feature_0, feature_1, feature_2
A,1,21,31,31
A,2,41,61,81
A,3,61,37,34

用于获取（1-异常分数）和异常分数的示例 input.csv：

Class,ignore, feature_0, feature_1, feature_2
A,1,21,31,31
B,2,41,61,81
A,3,61,37,34

【讨论】：