【问题标题】:Alternative to for-loop for large dataset to improve computational speed大型数据集的 for-loop 替代方案以提高计算速度
【发布时间】:2020-08-31 03:53:34
【问题描述】:

对于克里金法,我需要计算长度为 20000 的大型 mesh 数组。下面的代码运行良好,尤其是对于较小的mesh 长度(mesh,计算时间非常长(大约 45 分钟)。 data 的长度在 300 到 800 之间。下面是一个工作代码:

import numpy as np
from scipy.spatial.distance import pdist, squareform    

icovfct = [36, 6524.62, 1383.13, 2]
imesh = np.array([[632230, 632090, 632110, 632130, 632150, 632170, 632190, 632210, 632230, 632250, 632270, 632290, 632310, 632070], 
                  [3045160, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045200]], np.float64)
imesh = imesh.T
idata = np.array([[634026.049, 633901.182, 634001.365, 634007.132, 633893.706, 633802.327, 634144.246, 634015.993, 633897.326, 633779.479],
                  [3048117.579, 3048201.031, 3048191.922, 3047891.355, 3047994.462, 3048084.562, 3047633.421, 3047719.845, 3047818.914, 3047902.179],
                  [256.550, 236.317, 249.458, 281.889, 262.321, 239.495, 303.144, 295.319, 281.270, 261.083]], np.float64)
idata = idata.T

def ordinary(covfct, data, mesh):
    prediction = []              
    for i, dummy_val in enumerate(mesh):
        # distance between data and each data point in mesh
        d = np.sqrt((data[:, 0]-mesh[i, 0])**2.0 + (data[:, 1]-mesh[i, 1])**2.0)

        # add these distances to P
        P = np.vstack((data.T, d)).T

        # apply the covariance model to the distances
        k = (covfct[0] + covfct[1]*(1-np.exp(-3*(P[:,3]/covfct[2])**covfct[3])))

        # cast as a matrix
        k = np.matrix(k).T

        # form a matrix of distances between existing data points
        K1 = squareform(pdist(P[:,:2]))

        # apply the covariance model to these distances
        K = (covfct[0] + covfct[1]*(1-np.exp(-3*(K1.ravel()/covfct[2])**covfct[3])))

        # re-cast as a NumPy array                  
        K = np.array(K)

        # reshape into an array
        K = K.reshape(len(P), len(P))               

        # if kriging is exact K diag = 0
        K[K1 == 0] = 0

        # cast as a matrix
        K = np.matrix(K)

        # add a column and row of ones to Ks,
        # with a zero in the bottom, right hand corner (Lagrangian)
        K2 = np.matrix(np.ones((len(P)+1, len(P)+1)))
        K2[:len(P), :len(P)] = K
        K2[-1, -1] = 0.0

        # add a one to the end of ks
        k3 = np.matrix(np.ones((len(P)+1, 1)))
        k3[:len(P)] = k

        # calculate the kriging weights
        weights = np.linalg.lstsq(K2, k3, rcond = 1)
        weights = weights[:-3][0][:-1]
        weights = np.array(weights)

        # calculate the residuals
        residuals = P[:, 2]

        # calculate the estimation
        prediction.append(np.dot(weights.T, residuals))
    return prediction

interpolate = ordinary(icovfct, idata, imesh)

有没有办法优化代码从而减少计算时间?

【问题讨论】:

  • 由于每次迭代看起来都是独立的,您可以在 n 个线程池上运行处理时将网格分成 n 个部分。
  • 什么是网格?数组?你能在这个例子中包含一个小网格以便我们运行它吗?
  • 我们需要下载文件吗?你可以制作一个 15x15 的数组,其中包含大约 5 - 8 个要插值的值吗?

标签: python numpy optimization kriging geostatistics


【解决方案1】:

通常您可以通过将一个进程拆分为多个来减少时间。

在 python 中,我们有 python process lib 可以做到这一点, 因此,如果您的长度为 20,000 ,则可以同时运行两次代码 并且每个进程将处理 10,000 的长度.....

我希望你这个答案可以解决你的问题

【讨论】:

    【解决方案2】:

    您可以使用 GSTools 作为代码的直接替换,其中克里金求和是在 Cython 中实现的:

    import numpy as np
    import gstools as gs
    
    icovfct = [36, 6524.62, 1383.13, 2]
    imesh = np.array([[632230, 632090, 632110, 632130, 632150, 632170, 632190, 632210, 632230, 632250, 632270, 632290, 632310, 632070], 
                      [3045160, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045200]], np.float64)
    idata = np.array([[634026.049, 633901.182, 634001.365, 634007.132, 633893.706, 633802.327, 634144.246, 634015.993, 633897.326, 633779.479],
                      [3048117.579, 3048201.031, 3048191.922, 3047891.355, 3047994.462, 3048084.562, 3047633.421, 3047719.845, 3047818.914, 3047902.179],
                      [256.550, 236.317, 249.458, 281.889, 262.321, 239.495, 303.144, 295.319, 281.270, 261.083]], np.float64)
    
    rescale = np.power(3, icovfct[3] ** -1)  # to provide the rescaling factor "3"
    # what you are using is a stable covariance model
    model = gs.Stable(
        dim=2,
        nugget=icovfct[0],
        var=icovfct[1],
        len_scale=icovfct[2],
        alpha=icovfct[3],
        rescale=rescale,
    )
    # use ordinary kriging
    krige = gs.krige.Ordinary(model, cond_pos=idata[:2], cond_val=idata[2])
    mesh = krige(imesh, mesh_type="structured")
    ax = krige.plot()
    

    【讨论】:

    • 如果您的给定网格不是结构化的,只需编辑倒数第二行:mesh_type="unstructured"
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-12-04
    • 2021-10-19
    • 1970-01-01
    • 2018-10-05
    • 1970-01-01
    相关资源
    最近更新 更多