【问题标题】:How to plot the decision boundary of a One Class SVM?如何绘制一类 SVM 的决策边界?
【发布时间】:2021-05-17 01:19:55
【问题描述】:

我在绘制我编写的一类 SVM 的结果时遇到了一些麻烦。我尝试了在网上找到的不同示例,但根本没有好的结果。我有以下小数据集,其中 id 是样本的标识,f1 到 f9 是某些特征:

id,f1,f2,f3,f4,f5,f6,f7,f8,f9
d1,0,0,0,0,0,0,0,0.045454545,0
d2,0.047619048,0,0,0.047619048,0,0.047619048,0,0.047619048,0.047619048
d3,0,0,0,0.045454545,0,0,0,0,0
d4,0,0.045454545,0,0.045454545,0,0,0,0.045454545,0.045454545
d5,0,0,0,0,0,0,0,0,0
d6,0,0.045454545,0,0,0,0,0,0.045454545,0
d7,0,0,0,0,0,0,0.045454545,0,0
d8,0,0,0,0.045454545,0,0,0,0,0
d9,0,0,0,0.045454545,0,0,0,0,0
d10,0,0,0,0.045454545,0,0,0,0,0
d11,0,0,0,0.045454545,0,0,0,0,0
d12,0.045454545,0,0,0.045454545,0.045454545,0.045454545,0,0.045454545,0
d13,0,0,0,0.045454545,0,0,0,0.045454545,0.045454545
d14,0,0,0,0.045454545,0.045454545,0,0,0,0
d15,0,0,0,0,0,0,0,0.047619048,0.047619048
d16,0,0,0,0,0,0,0,0.045454545,0
d17,0,0,0.045454545,0,0,0,0,0,0.045454545
d18,0,0,0,0,0,0,0,0,0
d19,0.045454545,0,0.090909091,0,0,0,0.090909091,0,0
d20,0,0,0,0.090909091,0,0,0.045454545,0.045454545,0.045454545
d21,0,0,0.045454545,0.045454545,0,0.045454545,0.045454545,0,0
d22,0,0.090909091,0,0,0,0.045454545,0,0,0.045454545
d23,0,0.047619048,0,0.047619048,0,0,0,0.047619048,0.095238095
d24,0,0,0,0,0,0.045454545,0.045454545,0.045454545,0
d25,0,0,0,0,0,0,0,0.043478261,0
d26,0,0,0,0,0.043478261,0,0.043478261,0.043478261,0
d27,0.043478261,0,0,0.043478261,0,0,0.043478261,0.043478261,0

我的代码如下:

import matplotlib.pyplot as plt
import matplotlib
import pandas as pd
import numpy as np
from sklearn.svm import OneClassSVM
from sklearn import preprocessing

    listDrop=['id']
    df1=df.drop(listDrop,axis="columns")
    colNames=list(df1.columns.values)
    min_max_scaler=preprocessing.MinMaxScaler()
    x_scaled=min_max_scaler.fit_transform(df1)
    df1[colNames]=x_scaled
    svm = OneClassSVM(kernel='rbf', nu=0.2, gamma=1e-04)
    svm.fit(df1)
    pred=svm.predict(df1)

    listA=[i+1 for i,x in enumerate(pred) if x == -1]
    listB=[i+1 for i,x in enumerate(pred) if x == 1]
    xx, yy = np.meshgrid(np.linspace(-5, 5, 1), np.linspace(-5, 5, 7500))
    Xpred=np.array([xx.ravel(),yy.ravel()]+ [np.repeat(0, xx.ravel().size) for _ in range(7)]).T
    
    Z = svm.decision_function(Xpred).reshape(xx.shape)    
    assert len(Z) == (len(xx) * len(yy))
    Z = np.array(Z)
    Z = Z.reshape(xx.shape)((len(xx), len(yy)))
    a = plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
    plt.contourf(xx, yy,  Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.Blues_r)
    b1 = plt.scatter(pred[:, 0], pred[:, 1],  c='red')
    b3 = plt.scatter(listB[:,0], listB[:, 1], c="green")
    plt.legend([a.collections[0],b1,b3],
       ["learned frontier", "test","outliers"],
       loc="lower right",
       prop=matplotlib.font_manager.FontProperties(size=11))

我想得到如下图:

我在网上找到了这段代码,我正在玩以下几行:

Xpred=np.array([xx.ravel(),yy.ravel()]+ [np.repeat(0, xx.ravel().size) for _ in range(7)]).T

这是因为它给我一个关于尺寸的错误,我读到它是因为它是一个 2d 绘图并且我有 9 个特征,我应该用任何数据填充剩余的特征。

我还添加了断言的部分,但出现错误:

 assert len(Z) == (len(xx) * len(yy))

AssertionError

如何绘制这一类 SVM 的结果,它只返回一个由 1 和 -1 组成的数组,如下所示:

[ 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1  1  1 -1 -1 -1 -1 -1 -1 -1 -1
  1 -1 -1]

【问题讨论】:

    标签: python matplotlib scikit-learn


    【解决方案1】:

    标准方法是使用t-SNE 来降低数据的维度以实现可视化。将数据缩减为二维后,您可以轻松地复制 scikit-learn tutorial 中的可视化效果,请参见下面的代码示例。

    import pandas as pd
    import numpy as np
    from sklearn.svm import OneClassSVM
    from sklearn.preprocessing import MinMaxScaler
    from sklearn.manifold import TSNE
    import matplotlib.pyplot as plt
    
    # load the data
    df = pd.read_csv('data.csv')
    x = df.drop(labels='id', axis=1).values
    
    # rescale the data
    x_scaled = MinMaxScaler().fit_transform(x)
    
    # reduce the data to 2 dimensions using t-SNE
    x_reduced = TSNE(n_components=2, random_state=0).fit_transform(x_scaled)
    
    # fit the model to the reduced data
    svm = OneClassSVM(kernel='rbf', nu=0.2, gamma=1e-04)
    svm.fit(x_reduced)
    
    # extract the model predictions
    x_predicted = svm.predict(x_reduced)
    
    # define the meshgrid
    x_min, x_max = x_reduced[:, 0].min() - 5, x_reduced[:, 0].max() + 5
    y_min, y_max = x_reduced[:, 1].min() - 5, x_reduced[:, 1].max() + 5
    
    x_ = np.linspace(x_min, x_max, 500)
    y_ = np.linspace(y_min, y_max, 500)
    
    xx, yy = np.meshgrid(x_, y_)
    
    # evaluate the decision function on the meshgrid
    z = svm.decision_function(np.c_[xx.ravel(), yy.ravel()])
    z = z.reshape(xx.shape)
    
    # plot the decision function and the reduced data
    plt.contourf(xx, yy, z, cmap=plt.cm.PuBu)
    a = plt.contour(xx, yy, z, levels=[0], linewidths=2, colors='darkred')
    b = plt.scatter(x_reduced[x_predicted == 1, 0], x_reduced[x_predicted == 1, 1], c='white', edgecolors='k')
    c = plt.scatter(x_reduced[x_predicted == -1, 0], x_reduced[x_predicted == -1, 1], c='gold', edgecolors='k')
    plt.legend([a.collections[0], b, c], ['learned frontier', 'regular observations', 'abnormal observations'], bbox_to_anchor=(1.05, 1))
    plt.axis('tight')
    plt.show()
    

    【讨论】:

    • 非常感谢@gflavia 抽出宝贵时间回答这个问题。我还想先使用 PCA 来降低数据的维数,但是一旦我使用类似的数据集进行测试,结果以图形方式并不那么好。干杯。
    猜你喜欢
    • 2021-07-30
    • 2018-12-31
    • 2016-07-13
    • 2018-12-20
    • 2019-09-07
    • 2016-01-15
    • 2014-02-26
    • 2013-12-13
    • 2017-10-02
    相关资源
    最近更新 更多