【问题标题】:NumPy: Average overlapping matricies?NumPy:平均重叠矩阵?
【发布时间】:2020-01-05 06:45:38
【问题描述】:

假设您有一个形状为(n, m) 的矩阵。

此外,您还有k 更小的形状矩阵(s, m)

这些k 矩阵可能是通过将较大的矩阵分片而产生的:

stride = z
ranges = [] # will contain sub lists of start / end positions
for i in range(0, n, stride):
    if i + s > n:
        ranges.append([n-s, n]) #<-- if not evenly divisible include last ragged bit
        break
    else:
        ranges.append([i, i+s])

# k = len(ranges)

for a, b in ranges:
    submat = mat[a:b] # <--- produces submats of shape (s, m)
    # not necessarily where submats come from, just for 
    # simple example purpose, feel free to add random noise to each submat

如何在 numpy 中加入这些 k 重叠子矩阵,平均重叠区域?

目标是然后采用这些子垫并改造原始垫,例如类似:

blank = np.zeros((n, m))
for i in range(len(submats)):
    a, b = ranges[i]
    blank[a:b] += submats[i] #<--- doesn't account for different amounts of overlapping regions

具体数字:

n = 693
m = 10
# so mat has shape (693, 10)

s = 500
stride = 50

ranges = [[0, 500], [50, 550], [100, 600], [150, 650], [193, 693]]
# notice that the range (0,50) doesn't need to be averaged

k = 5 # len(ranges)

# so we have k submats of shape (500, 10)

我目前正在这样做:

def count_overlap(max_len, ranges): # from example 693, and [[0, 500], ...]
    tally = np.zeros(max_len)
    for i in range(max_len):
        for a, b in ranges:
            if a <= i and i < b:
                tally[i] += 1
    return tally


olap = count_overlap(693, ranges)
olap[:55]
# ([1., 1., ..., 1., 2., 2., 2., 2., 2.])
olap[-50:]
# ([2., 2., 2., 2., 2., 2., 2., 2., 1., 1., ..., 1., 1., 1.])

要知道垫子的每个索引要除多少

【问题讨论】:

  • @Divakar 在帖子中,是的,他们都是(s, m)
  • @Divakar 形状与起始垫相同(n, m)
  • @Divakar 没有。您想从k(s, m) 矩阵重新创建(n,m) 矩阵,其中第一维中k 矩阵之间存在重叠
  • 通常你可能有多少个范围?如果它是一个小数字,那么在循环中切片和添加是有意义的,就像您已经拥有的那样。
  • @Divakar 我更新了问题以提供一些具体示例,k 通常为

标签: python numpy


【解决方案1】:

虽然在我自己的问题中,我提供了一种替代方法(而且不太优雅),但我并没有试图回答我自己的问题。相反,我只是将相关功能和解决方案包装起来供其他人使用:

助手

def shard_rng(maxlen, sublen, stride):
    ranges = [] 
    for i in range(0, maxlen, stride):
        if i + sublen > maxlen:
            ranges.append([maxlen-sublen, maxlen])
            break
        else:
            ranges.append([i, i+sublen])
    return ranges
# for testing. stitched_mat - mat should be 1
def split_mat(mat, ranges):
    submats = []
    for a,b in ranges:
        submats.append(mat[a:b] + 1)
    return submats
# part of solution 1
def weight_rngs(ranges):
    n = ranges[-1][-1]
    bins = map(np.bincount,np.array(ranges).T,(None,None),(n+1,n+1))
    vals = np.subtract(*bins).cumsum()
    weights = 1 / vals[:n,None]
    return weights

解决方案

@Paul Panzer

提供
# solution 1
def stitch_mats(shape, submats, ranges):    
    stitched = np.zeros(shape)
    weights = weight_rngs(ranges)
    for submat, (start, stop) in zip(submats, ranges):
        stitched[start:stop] += weights[start:stop] * submat        
    return stitched
# solution 2
def stitch_mats2(shape, submats, ranges):
    ranges = np.array(ranges)
    ro = ranges.ravel().argsort(kind='stable')

    # put 1 for starting and -1 for ending, take cumsum
    cnts = (1-((ro&1)<<1)).cumsum()

    stitched = np.zeros((n,m))
    # add slices
    for submat, (start, stop) in zip(submats,ranges):
        stitched[start:stop] += submat

    rs = ranges.ravel()[ro]
    # divide by overlap
    for start, stop, count in zip(rs[:-1],rs[1:],cnts[:-1]):
        stitched[start:stop] /= count
    return stitched

测试

n = 693
m = 10
s = 500 # sublen
stride = 50

mat = np.random.randint(0,10,(n,m))
ranges = shard_rng(n, s, stride)
submats = split_mat(mat, ranges)


stitched_1 = stitch_mats(mat.shape, submats, ranges)
stitched_2 = stitch_mats2(mat.shape, submats, ranges)

np.unique(stitched_1-mat-1.), np.unique(stitched_2-mat-1.)
# array([-8.8817842e-16,  0.0000000e+00,  4.4408921e-16,  8.8817842e-16]), array([0.])

【讨论】:

    【解决方案2】:

    这是一种使用bincount + cumsum逐行计算重叠的方法:

    更新:添加了另一种仅使用切片的方法。我希望这通常会更快。

    import numpy as np
    
    n = 693
    m = 10
    # so mat has shape (693, 10)
    
    s = 500
    stride = 50
    
    ranges = [[0, 500], [50, 550], [100, 600], [150, 650], [193, 693]]
    # notice that the range (0,50) doesn't need to be averaged
    
    k = 5 # len(ranges)
    
    mat = np.random.randint(0,10,(n,m))
    submats = []
    for a, b in ranges:
        submats.append(mat[a:b])
    
    
    ranges = np.asarray(ranges)
    out = np.zeros((n,m))
    # put a 1 at every start and a -1 at every stop
    # then take the cumsum this will assign to each row the
    # number of intervals it is in
    # finally, take the reciprocal
    weight = 1 / np.subtract(*map(np.bincount,ranges.T,(None,None),(n+1,n+1))).cumsum()[:n,None]
    for sm,(a,b) in zip(submats,ranges):
        out[a:b] += weight[a:b] * sm
    
    
    # method 2
    
    # sort range ends
    ro = ranges.ravel().argsort(kind='stable')
    # put 1 for starting and -1 for ending, take cumsum
    cnts = (1-((ro&1)<<1)).cumsum()
    out = np.zeros((n,m))
    # add slices
    for sm,(a,b) in zip(submats,ranges):
        out[a:b] += sm
    rs = ranges.ravel()[ro]
    # divide by overlap
    for a,b,c in zip(rs[:-1],rs[1:],cnts[:-1]):
        out[a:b] /= c
    

    【讨论】:

    • 我遇到了一个等效的,虽然不太优雅的解决方案:P 谢谢你的帮助。除了我们的方法中只有一个不同的索引...-43(你必须打电话给(1 / overlap))我不太熟悉你做了什么,所以我需要弄清楚哪个是正确的
    • @SumNeuron 使用未修改的子矩阵切片,拼接后的版本out 最多四舍五入应该与mat 相同。也许您可以将其用作快速测试。
    • 对不起np.where(weight.flatten() != (1/olap)) 导致(array([500, 550, 600, 650]),) 所以要么我重复计算垃圾箱的边缘,要么垃圾箱方法是。我不太理解你的意思out“应该四舍五入......”
    • 对不起,我的方法高估了我认为的结果:P 和&lt;=b
    • @SumNeuron 我的意思是np.allclose(mat,out) 应该返回True 作为示例。另外,我添加了另一种我认为应该更快的方法。请务必检查一下。
    猜你喜欢
    • 2015-05-31
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-10-21
    • 1970-01-01
    • 2021-07-24
    • 2011-04-23
    • 1970-01-01
    相关资源
    最近更新 更多