大型数据集的 for-loop 替代方案以提高计算速度答案

【问题标题】：Alternative to for-loop for large dataset to improve computational speed大型数据集的 for-loop 替代方案以提高计算速度
【发布时间】：2020-08-31 03:53:34
【问题描述】：

对于克里金法，我需要计算长度为 20000 的大型 mesh 数组。下面的代码运行良好，尤其是对于较小的mesh 长度（mesh，计算时间非常长（大约 45 分钟）。 data 的长度在 300 到 800 之间。下面是一个工作代码：

import numpy as np
from scipy.spatial.distance import pdist, squareform    

icovfct = [36, 6524.62, 1383.13, 2]
imesh = np.array([[632230, 632090, 632110, 632130, 632150, 632170, 632190, 632210, 632230, 632250, 632270, 632290, 632310, 632070], 
                  [3045160, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045200]], np.float64)
imesh = imesh.T
idata = np.array([[634026.049, 633901.182, 634001.365, 634007.132, 633893.706, 633802.327, 634144.246, 634015.993, 633897.326, 633779.479],
                  [3048117.579, 3048201.031, 3048191.922, 3047891.355, 3047994.462, 3048084.562, 3047633.421, 3047719.845, 3047818.914, 3047902.179],
                  [256.550, 236.317, 249.458, 281.889, 262.321, 239.495, 303.144, 295.319, 281.270, 261.083]], np.float64)
idata = idata.T

def ordinary(covfct, data, mesh):
    prediction = []              
    for i, dummy_val in enumerate(mesh):
        # distance between data and each data point in mesh
        d = np.sqrt((data[:, 0]-mesh[i, 0])**2.0 + (data[:, 1]-mesh[i, 1])**2.0)

        # add these distances to P
        P = np.vstack((data.T, d)).T

        # apply the covariance model to the distances
        k = (covfct[0] + covfct[1]*(1-np.exp(-3*(P[:,3]/covfct[2])**covfct[3])))

        # cast as a matrix
        k = np.matrix(k).T

        # form a matrix of distances between existing data points
        K1 = squareform(pdist(P[:,:2]))

        # apply the covariance model to these distances
        K = (covfct[0] + covfct[1]*(1-np.exp(-3*(K1.ravel()/covfct[2])**covfct[3])))

        # re-cast as a NumPy array                  
        K = np.array(K)

        # reshape into an array
        K = K.reshape(len(P), len(P))               

        # if kriging is exact K diag = 0
        K[K1 == 0] = 0

        # cast as a matrix
        K = np.matrix(K)

        # add a column and row of ones to Ks,
        # with a zero in the bottom, right hand corner (Lagrangian)
        K2 = np.matrix(np.ones((len(P)+1, len(P)+1)))
        K2[:len(P), :len(P)] = K
        K2[-1, -1] = 0.0

        # add a one to the end of ks
        k3 = np.matrix(np.ones((len(P)+1, 1)))
        k3[:len(P)] = k

        # calculate the kriging weights
        weights = np.linalg.lstsq(K2, k3, rcond = 1)
        weights = weights[:-3][0][:-1]
        weights = np.array(weights)

        # calculate the residuals
        residuals = P[:, 2]

        # calculate the estimation
        prediction.append(np.dot(weights.T, residuals))
    return prediction

interpolate = ordinary(icovfct, idata, imesh)

有没有办法优化代码从而减少计算时间？

【问题讨论】：

由于每次迭代看起来都是独立的，您可以在 n 个线程池上运行处理时将网格分成 n 个部分。
什么是网格？数组？你能在这个例子中包含一个小网格以便我们运行它吗？
stackoverflow.com/help/minimal-reproducible-example
我们需要下载文件吗？你可以制作一个 15x15 的数组，其中包含大约 5 - 8 个要插值的值吗？

标签： python numpy optimization kriging geostatistics

【解决方案1】：

通常您可以通过将一个进程拆分为多个来减少时间。

在 python 中，我们有 python process lib 可以做到这一点，因此，如果您的长度为 20,000 ，则可以同时运行两次代码并且每个进程将处理 10,000 的长度.....

我希望你这个答案可以解决你的问题

【讨论】：

【解决方案2】：

您可以使用 GSTools 作为代码的直接替换，其中克里金求和是在 Cython 中实现的：

import numpy as np
import gstools as gs

icovfct = [36, 6524.62, 1383.13, 2]
imesh = np.array([[632230, 632090, 632110, 632130, 632150, 632170, 632190, 632210, 632230, 632250, 632270, 632290, 632310, 632070], 
                  [3045160, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045200]], np.float64)
idata = np.array([[634026.049, 633901.182, 634001.365, 634007.132, 633893.706, 633802.327, 634144.246, 634015.993, 633897.326, 633779.479],
                  [3048117.579, 3048201.031, 3048191.922, 3047891.355, 3047994.462, 3048084.562, 3047633.421, 3047719.845, 3047818.914, 3047902.179],
                  [256.550, 236.317, 249.458, 281.889, 262.321, 239.495, 303.144, 295.319, 281.270, 261.083]], np.float64)

rescale = np.power(3, icovfct[3] ** -1)  # to provide the rescaling factor "3"
# what you are using is a stable covariance model
model = gs.Stable(
    dim=2,
    nugget=icovfct[0],
    var=icovfct[1],
    len_scale=icovfct[2],
    alpha=icovfct[3],
    rescale=rescale,
)
# use ordinary kriging
krige = gs.krige.Ordinary(model, cond_pos=idata[:2], cond_val=idata[2])
mesh = krige(imesh, mesh_type="structured")
ax = krige.plot()

【讨论】：

如果您的给定网格不是结构化的，只需编辑倒数第二行：mesh_type="unstructured"。