【问题标题】:Python implementation of precomputed RBF kernel with Gram matrix?带有 Gram 矩阵的预计算 RBF 内核的 Python 实现?
【发布时间】:2018-02-17 18:36:10
【问题描述】:

Python 的信息和precomputed kernels 示例非常有限。 sklearn 仅提供linear kernel 的一个简单示例:http://scikit-learn.org/stable/modules/svm.html

这是线性内核的代码:

import numpy as np
from scipy.spatial.distance import cdist
from sklearn.datasets import load_iris

# import data
iris = datasets.load_iris()
X = iris.data                    
Y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, Y)

clf = svm.SVC(kernel='precomputed')

# Linear kernel
G_train = np.dot(X_train, X_train.T) 
clf.fit(G_train, y_train) 

G_test = np.dot(X_test, X_train.T)
y_pred = clf.predict(G_test)    

这对于进一步理解其他重要内核的实现不是很有帮助,例如RBF kernel,它将是:

K(X, X') = np.exp(divide(-cdist(X, X, 'euclidean), 2*np.std(X**2)))

如何对traintest进行同样的拆分并为precomputed kernel实现RBF

如果内核变得更加复杂,这取决于需要在单独的函数中计算的其他参数,例如参数alpha >= 0

K(X, X') = alpha('some function depending on X_train, X_test')*np.exp(divide(-cdist(X, X, 'euclidean), 2*np.std(X**2)))

我们需要此类非平凡内核的示例。如有任何建议,我将不胜感激。

【问题讨论】:

    标签: python matrix scikit-learn kernel svm


    【解决方案1】:

    我们可以手动编写内核 pca。让我们从多项式内核开始。

    from sklearn.datasets import make_circles
    from scipy.spatial.distance import pdist, squareform
    from scipy.linalg import eigh
    import numpy as np
    import matplotlib.pyplot as plt
    %matplotlib inline
    
    X_c, y_c = make_circles(n_samples=100, random_state=654)
    
    plt.figure(figsize=(8,6))
    
    plt.scatter(X_c[y_c==0, 0], X_c[y_c==0, 1], color='red')
    plt.scatter(X_c[y_c==1, 0], X_c[y_c==1, 1], color='blue')
    
    
    plt.ylabel('y coordinate')
    plt.xlabel('x coordinate')
    
    plt.show()
    

    数据:

    def degree_pca(X, gamma, degree, n_components):
        # Calculating kernel
    
        K = gamma*(X@X.T+1)**degree
    
    
        # Obtaining eigenvalues in descending order with corresponding
        # eigenvectors from the symmetric matrix.
        eigvals, eigvecs = eigh(K)
    
        # Obtaining the i eigenvectors that corresponds to the i highest eigenvalues.
        X_pc = np.column_stack((eigvecs[:,-i] for i in range(1,n_components+1)))
    
        return X_pc
    

    现在转换数据并绘制它

    X_c1 = degree_pca(X_c, gamma=5, degree=2, n_components=2)
    
    plt.figure(figsize=(8,6))
    
    plt.scatter(X_c1[y_c==0, 0], X_c1[y_c==0, 1], color='red')
    plt.scatter(X_c1[y_c==1, 0], X_c1[y_c==1, 1], color='blue')
    
    
    plt.ylabel('y coordinate')
    plt.xlabel('x coordinate')
    
    plt.show()
    

    线性可分:

    现在点可以线性分开。

    接下来让我们编写 RBF 内核。为了演示,让我们以卫星为例。

    from sklearn.datasets import make_moons
    X, y = make_moons(n_samples=100, random_state=654)
    
    plt.figure(figsize=(8,6))
    
    plt.scatter(X[y==0, 0], X[y==0, 1], color='red')
    plt.scatter(X[y==1, 0], X[y==1, 1], color='blue')
    
    
    plt.ylabel('y coordinate')
    plt.xlabel('x coordinate')
    
    plt.show()
    

    月亮:

    内核 pca 转换:

    def stepwise_kpca(X, gamma, n_components):
        """
        X: A MxN dataset as NumPy array where the samples are stored as rows (M), features as columns (N).
        gamma: coefficient for the RBF kernel.
        n_components: number of components to be returned.
    
        """
        # Calculating the squared Euclidean distances for every pair of points
        # in the MxN dimensional dataset.
        sq_dists = pdist(X, 'sqeuclidean')
    
        # Converting the pairwise distances into a symmetric MxM matrix.
        mat_sq_dists = squareform(sq_dists)
    
        K=np.exp(-gamma*mat_sq_dists)
    
        # Centering the symmetric NxN kernel matrix.
        N = K.shape[0]
        one_n = np.ones((N,N)) / N
        K = K - one_n.dot(K) - K.dot(one_n) + one_n.dot(K).dot(one_n)
    
        # Obtaining eigenvalues in descending order with corresponding
        # eigenvectors from the symmetric matrix.
        eigvals, eigvecs = eigh(K)
    
        # Obtaining the i eigenvectors that corresponds to the i highest eigenvalues.
        X_pc = np.column_stack((eigvecs[:,-i] for i in range(1,n_components+1)))
    
        return X_pc
    

    让我们开始吧

    X_4 = stepwise_kpca(X, gamma=15, n_components=2)
    
    plt.scatter(X_4[y==0, 0], X_4[y==0, 1], color='red')
    plt.scatter(X_4[y==1, 0], X_4[y==1, 1], color='blue')
    
    
    plt.ylabel('y coordinate')
    plt.xlabel('x coordinate')
    
    plt.show()
    

    结果:

    【讨论】:

    • 感谢您在 Kernel PCA 上所做的努力,但这与我的问题无关。
    最近更新 更多