【问题标题】:Kernel density estimation in Python for over 4d dataPython中超过4d数据的核密度估计
【发布时间】:2021-01-13 21:10:56
【问题描述】:

我正在尝试使用 SciPy 的 gaussian_kde 函数来估计多元数据的密度。
在我下面的代码中,如果维度数超过 4d,可能会出现以下错误(大约 50%)。
如果数字小于 3d,则大多数情况下不会发生错误。

# Import
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline

# Data
N1 = np.random.normal(size=400)
N2 = np.random.normal(scale=0.5, size=400)
N3 = np.random.normal(scale=0.5, size=400)
N4 = np.random.normal(scale=0.5, size=400)
N5 = np.random.normal(scale=0.5, size=400)
N6 = np.random.normal(scale=0.5, size=400)

a1 = N1+1*N2
a2 = N1-1*N2
a3 = N1+1*N3
a4 = N1-1*N3
a5 = N1+1*N4
a6 = N1-1*N4
a7 = N1+1*N5
a8 = N1-1*N5
a9 = N1+1*N6
a0 = N1-1*N6

# Kernel density
xy = np.vstack([a1,a2,a3,a4,a5,a6,a7,a8,a9,a0])
kernel = stats.gaussian_kde(xy)
z_est = kernel.evaluate(xy)

# Visualization
x = a1
y = a2
plt.scatter(x, y, c=z_est)

错误信息

LinAlgError                               Traceback (most recent call last)
<ipython-input-11-ce4c335d8dd1> in <module>
      2 xy = np.vstack([a1,a2,a3,a4,a5])
      3 kernel = stats.gaussian_kde(xy)
----> 4 z_est = kernel.evaluate(xy)

~\program\anaconda\lib\site-packages\scipy\stats\kde.py in evaluate(self, points)
    244         result = zeros((m,), dtype=float)
    245 
--> 246         whitening = linalg.cholesky(self.inv_cov)
    247         scaled_dataset = dot(whitening, self.dataset)
    248         scaled_points = dot(whitening, points)

~\program\anaconda\lib\site-packages\scipy\linalg\decomp_cholesky.py in cholesky(a, lower, overwrite_a, check_finite)
     89     """
     90     c, lower = _cholesky(a, lower=lower, overwrite_a=overwrite_a, clean=True,
---> 91                          check_finite=check_finite)
     92     return c
     93 

~\program\anaconda\lib\site-packages\scipy\linalg\decomp_cholesky.py in _cholesky(a, lower, overwrite_a, clean, check_finite)
     38     if info > 0:
     39         raise LinAlgError("%d-th leading minor of the array is not positive "
---> 40                           "definite" % info)
     41     if info < 0:
     42         raise ValueError('LAPACK reported an illegal value in {}-th argument'

LinAlgError: 1-th leading minor of the array is not positive definite

为什么我会收到错误消息?

【问题讨论】:

    标签: python kernel-density


    【解决方案1】:
    import numpy as np
    from scipy import stats
    import matplotlib.pyplot as plt
    %matplotlib inline
    
    def measure(n):
        a1 = np.random.normal(size=n)
        a2= np.random.normal(scale=0.5, size=n)
        return a1+a2, a1-a2
    
    a1, a2 = measure(4000)
    xmin = a1.min()
    xmax = a1.max()
    ymin = a2.min()
    ymax = a2.max()
    
    X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
    positions = np.vstack([X.ravel(), Y.ravel()])
    values = np.vstack([a1, a2])
    kernel = stats.gaussian_kde(values)
    z_est = np.reshape(kernel.evaluate(positions).T, X.shape)
    
    
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.imshow(np.rot90(z_est), cmap=plt.cm.gist_earth_r,
              extent=[xmin, xmax, ymin, ymax])
    ax.plot(a1, a2, 'k.', markersize=2)
    ax.set_xlim([xmin, xmax])
    ax.set_ylim([ymin, ymax])
    plt.show()
    

    enter image description here

    【讨论】:

    • 这段代码是记录错误的原因。这是使用核密度的一个例子。
    猜你喜欢
    • 2017-02-06
    • 2019-02-14
    • 2018-10-06
    • 2018-03-08
    • 2014-03-22
    • 2012-04-06
    • 1970-01-01
    • 2012-01-06
    • 2015-10-04
    相关资源
    最近更新 更多