请求 for 循环和列表推导的 NumPy/SciPy 向量化替换答案

【问题标题】：Requesting NumPy/SciPy vectorization replacements of for loops and list comprehensions请求 for 循环和列表推导的 NumPy/SciPy 向量化替换
【发布时间】：2019-10-17 17:07:36
【问题描述】：

我有两个不同的数组处理问题，我想解决 AQAP（Q=quickly），以确保解决方案在我的过程中没有速率限制（使用 NEAT 训练视频游戏机器人）。在一种情况下，我想构建一个惩罚函数来制造更大的柱高，而在另一种情况下，我想奖励构建“具有共同价值的岛屿”。

操作从具有黑色/0 背景的 26 行 x 6 列 numpy 灰度值数组开始。

对于已经实现了一些 numpy 的每个问题，我都有可行的解决方案，但我想推动对两者都采用完全矢量化的方法。

    import numpy as np,
    from scipy.ndimage.measurements import label as sp_label
    from math import ceil

这两个问题都是从这样的数组开始的：

    img= np.array([[ 0.,  0.,  0., 12.,  0.,  0.],
                   [ 0.,  0.,  0., 14.,  0.,  0.],               
                   [ 0.,  0.,  0., 14.,  0.,  0.],
                   [ 0.,  0.,  0., 14.,  0.,  0.],               
                   [16.,  0.,  0., 14.,  0.,  0.],
                   [16.,  0.,  0., 12.,  0.,  0.],               
                   [12.,  0., 11.,  0.,  0.,  0.],
                   [12.,  0., 11.,  0.,  0.,  0.],               
                   [16.,  0., 15.,  0., 15.,  0.],
                   [16.,  0., 15.,  0., 15.,  0.],               
                   [14.,  0., 12.,  0., 11.,  0.],
                   [14.,  0., 12.,  0., 11.,  0.],               
                   [14., 15., 11.,  0., 11.,  0.],
                   [14., 15., 11.,  0., 11.,  0.],               
                   [13., 16., 12.,  0., 13.,  0.],
                   [13., 16., 12.,  0., 13.,  0.],               
                   [13., 14., 16.,  0., 16.,  0.],
                   [13., 14., 16.,  0., 16.,  0.],               
                   [16., 14., 15.,  0., 14.,  0.],
                   [16., 14., 15.,  0., 14.,  0.],               
                   [14., 16., 14.,  0., 11.,  0.],
                   [14., 16., 14.,  0., 11.,  0.],               
                   [11., 13., 14., 16., 12., 13.],
                   [11., 13., 14., 16., 12., 13.],               
                   [12., 12., 15., 14., 15., 11.], 
                   [12., 12., 15., 14., 15., 11.]])

目前正在解决第一个（列高）问题：


    # define valid connection directions for sp_label
    c_valid_conns = np.array((0,1,0,0,1,0,0,1,0,), dtype=np.int).reshape((3,3))

    # run the island labeling function sp_label    
    # c_ncomponents is a simple count of the conected columns in labeled
    columns, c_ncomponents = sp_label(img, c_valid_conns)

    # calculate out the column lengths
    col_lengths = np.array([(columns[columns == n]/n).sum() for n in range(1, c_ncomponents+1)])
    col_lengths

给我这个数组：[ 6. 22. 20. 18. 14. 4. 4.]

（如果代码始终忽略不“包含”数组底部的标记区域（行索引 25/-1））

第二个问题涉及屏蔽每个唯一值并计算每个屏蔽数组中的连续体，以获得连续体的大小：

    # initial values to start the ball rolling
    values = [11, 12, 13, 14, 15, 16]
    isle_avgs_i = [1.25, 2, 0, 1,5, 2.25, 1]

    # apply filter masks to img to isolate each value 
    # Could these masks be pushed out into a third array dimension instead?
    masks = [(img == g) for g in np.unique(values)]

    # define the valid connectivities (8-way) for the sp_label function
    m_valid_conns = np.ones((3,3), dtype=np.int)

    # initialize islanding lists 
    # I'd love to do away with these when I no longer need the .append() method)
    mask_isle_avgs, isle_avgs = [],[]

    # for each mask in the image:         
    for i, mask in enumerate(masks):

        # run the island labeling function sp_label
        # m_labeled is the array containing the sequentially labeled islands
        # m_ncomponents is a simple count of the islands in m_labeled
        m_labeled, m_ncomponents = sp_label(mask, m_valid_conns)

        # collect the average (island size-1)s (halving to account for... 
        # ... y resolution) for each island into mask_isle_avgs list 
        # I'd like to vectorize this step
        mask_isle_avgs.append((sum([ceil((m_labeled[m_labeled == n]/n).sum()/2)-1 
                                    for n in range(1, m_ncomponents+1)]))/(m_ncomponents+1))

        # add up the mask isle averages for all the islands... 
        # ... and collect into isle_avgs list
        # I'd like to vectorize this step
        isle_avgs.append(sum(mask_isle_avgs))

    # initialize a difference list for the isle averages (I also want to do away with this step)
    d_avgs = []

    # evaluate whether isle_avgs is greater for the current frame or the...
    # ... previous frame (isle_avgs_i) and append either the current...
    # ... element or 0, depending on whether the delta is non-negative
    # I want this command vectorized
    [d_avgs.append(isle_avgs[j]) 
     if (isle_avgs[j]-isle_avgs_i[j])>=0 
     else d_avgs.append(0) for j in range(len(isle_avgs))]
    d_avgs

给我这个 d_avgs 数组：[0, 0, 0.46785714285714286, 1.8678571428571429, 0, 0]

（如果代码始终忽略不“包含”数组底部的标记区域（行索引 25/-1），则再次提供此数组：

[0, 0, 0.43452380952380953, 1.6345238095238095, 0, 0])

我希望删除所有列表操作和理解，并将它们移动到完全矢量化的 numpy/scipy 实现中，并获得相同的结果。

我们将不胜感激任何删除这些步骤的帮助。

【问题讨论】：

我的一位同事为第一个问题提供了一个非常优雅的解决方案，包括奖励条件：col_lengths = np.isin(columns, columns[-1]).sum(axis = 0) 岛屿条件被证明更棘手，但一些二元形态问题可能对我有所帮助。
更正：这就是我上面所说的（如果在底行，则不允许零触发列求和：col_hts = np.isin(columns, columns[-1][np.nonzero(columns[-1])]).sum(axis = 0)

标签： python numpy scipy vectorization numpy-ndarray

【解决方案1】：

这是我最终解决此问题的方法：

######## column height penalty calculation ########
            # c_ncomponents is a simple count of the conected columns in labeled
            columns, c_ncomponents = sp_label(unit_img, c_valid_conns)
#                 print(columns)
            # throw out the falling block with .isin(x,x[-1]) combined with... 
            # the mask nonzero(x) 
            drop_falling = np.isin(columns, columns[-1][np.nonzero(columns[-1])])
            col_hts = drop_falling.sum(axis=0)
#                 print(f'col_hts {col_hts}')
            # calculate differentials for the (grounded) column heights
            d_col_hts = np.sum(col_hts - col_hts_i)
#                 print(f'col_hts {col_hts} - col_hts_i {col_hts_i} ===> d_col_hts {d_col_hts}')
            # set col_hts_i to current col_hts for next evaluation
            col_hts_i = col_hts
            # calculate penalty/bonus function
#                 col_pen = (col_hts**4 - 3**4).sum()
            col_pen = np.where(d_col_hts > 0, (col_hts**4 - 3**4), 0).sum()
#                 
#             if col_pen !=0:
#                 print(f'col_pen: {col_pen}')
######## end column height penalty calculation ########

######## color island bonus calculation ########
            # mask the unit_img to remove the falling block
            isle_img = drop_falling * unit_img
#             print(isle_img)
            # broadcast the game board to add a layer for each color
            isle_imgs = np.broadcast_to(isle_img,(7,*isle_img.shape))
            # define a mask to discriminate on color in each layer
            isle_masked = isle_imgs*[isle_imgs==ind_grid[0]]
            # reshape the array to return to 3 dimensions
            isle_masked = isle_masked.reshape(isle_imgs.shape)
            # generate the isle labels
            isle_labels, isle_ncomps = sp_label(isle_masked, i_valid_conns)
            # determine the island sizes (via return_counts) for all the unique labels
            isle_inds, isle_sizes = np.unique(isle_labels, return_counts=True)
            # zero out isle_sizes[0] to remove spike for background (500+ for near empty board)
            isle_sizes[0] = 0
            # evaluate difference to determine whether bonus applies
            if isle_sizes_i.sum() != isle_sizes.sum():
            # calculate bonus for all island sizes ater throwing away the 0 count
                isle_bonus = (isle_sizes**3).sum()
            else:
                isle_bonus = 0

【讨论】：