numpy图像中像素+邻域的高效处理答案

【问题标题】：Efficient processing of pixel + neighborhood in numpy imagenumpy图像中像素+邻域的高效处理
【发布时间】：2012-11-10 12:05:59
【问题描述】：

我有一个场景的范围图像。我遍历图像并计算检测窗口下的平均深度变化。检测窗口根据当前位置周围像素的平均深度改变大小。我累积平均变化以生成简单的响应图像。

大部分时间都花在 for 循环中，在我的机器上拍摄 512x52 图像大约需要 40 多秒。我希望能加快一些速度。是否有更有效/更快的方式来遍历图像？是否有更好的 pythonic/numpy/scipy 方式来访问每个像素？还是我去学 cython？

编辑：我通过使用 scipy.misc.imread() 而不是 skimage.io.imread() 将运行时间减少到大约 18 秒。不知道有什么区别，我会尝试调查。

下面是简化版的代码：

import matplotlib.pylab as plt
import numpy as np
from skimage.io import imread
from skimage.transform import integral_image, integrate
import time

def intersect(a, b):
    '''Determine the intersection of two rectangles'''
    rect = (0,0,0,0)
    r0 = max(a[0],b[0])
    c0 = max(a[1],b[1])
    r1 = min(a[2],b[2])
    c1 = min(a[3],b[3])
    # Do we have a valid intersection?
    if r1 > r0 and  c1 > c0: 
         rect = (r0,c0,r1,c1)
    return rect

# Setup data
depth_src = imread("test.jpg", as_grey=True)
depth_intg = integral_image(depth_src)   # integrate to find sum depth in region
depth_pts = integral_image(depth_src > 0)  # integrate to find num points which have depth
boundary = (0,0,depth_src.shape[0]-1,depth_src.shape[1]-1) # rectangle to intersect with

# Image to accumulate response
out_img = np.zeros(depth_src.shape)

# Average dimensions of bbox/detection window per unit length of depth
model = (0.602,2.044)  # width, height

start_time = time.time()
for (r,c), junk in np.ndenumerate(depth_src):
    # Find points around current pixel      
    r0, c0, r1, c1 = intersect((r-1, c-1, r+1, c+1), boundary)

    # Calculate average of depth of points around current pixel
    scale =  integrate(depth_intg, r0, c0, r1, c1) * 255 / 9.0 

    # Based on average depth, create the detection window
    r0 = r - (model[0] * scale/2)
    c0 = c - (model[1] * scale/2)
    r1 = r + (model[0] * scale/2)
    c1 = c + (model[1] * scale/2)

    # Used scale optimised detection window to extract features
    r0, c0, r1, c1 = intersect((r0,c0,r1,c1), boundary)
    depth_count = integrate(depth_pts,r0,c0,r1,c1)
    if depth_count:
         depth_sum = integrate(depth_intg,r0,c0,r1,c1)
         avg_change = depth_sum / depth_count
         # Accumulate response
         out_img[r0:r1,c0:c1] += avg_change
print time.time() - start_time, " seconds"

plt.imshow(out_img)
plt.gray()
plt.show()

【问题讨论】：

标签： python image-processing numpy

【解决方案1】：

迈克尔，有趣的问题。您遇到的主要性能问题似乎是图像中的每个像素都有两个计算的积分（）函数，一个是 3x3 大小，另一个是事先不知道的大小。无论您使用什么 numpy 函数，以这种方式计算单个积分的效率都非常低；这是一个算法问题，而不是实现问题。考虑一个大小为 NN 的图像。您可以仅使用大约 4*NN 次操作计算该图像中任何大小 KK 的所有积分，而不是（正如人们可能天真地期望的那样）NNKK。您这样做的方法是首先计算每行中窗口 K 上滑动和的图像，然后在每列中的结果上滑动总和。更新每个滑动和以移动到下一个像素只需要添加当前窗口中的最新像素并减去前一个窗口中最旧的像素，因此无论窗口大小如何，每个像素都进行两次操作。我们必须这样做两次（对于行和列），因此每个像素需要 4 次操作。

我不确定 numpy 中是否内置了滑动窗口总和，但这个答案建议了几种方法来做到这一点，使用步幅技巧：https://stackoverflow.com/a/12713297/1828289。您当然可以通过一个循环在列上和一个循环在行上完成相同的操作（使用切片来提取行/列）。

例子：

# img is a 2D ndarray
# K is the size of sums to calculate using sliding window
row_sums = numpy.zeros_like(img)
for i in range( img.shape[0] ):
    if i > K:
        row_sums[i,:] = row_sums[i-1,:] - img[i-K-1,:] + img[i,:]
    elif i > 1:
        row_sums[i,:] = row_sums[i-1,:] + img[i,:]
    else: # i == 0
        row_sums[i,:] = img[i,:]

col_sums = numpy.zeros_like(img)
for j in range( img.shape[1] ):
    if j > K:
        col_sums[:,j] = col_sums[:,j-1] - row_sums[:,j-K-1] + row_sums[:,j]
    elif j > 1:
        col_sums[:,j] = col_sums[:,j-1] + row_sums[:,j]
    else: # j == 0
        col_sums[:,j] = row_sums[:,j]

# here col_sums[i,j] should be equal to numpy.sum(img[i-K:i, j-K:j]) if i >=K and j >= K
# first K rows and columns in col_sums contain partial sums and can be ignored

您如何最好地将其应用到您的案例中？我认为您可能想要预先计算 3x3（平均深度）和几个较大尺寸的积分，并使用 3x3 的值来选择检测窗口的较大尺寸之一（假设我了解您的意图算法）。您需要的较大尺寸的范围可能会受到限制，或者人为限制它可能仍然可以正常工作，只需选择最接近的尺寸即可。使用滑动总和计算所有积分的效率要高得多，我几乎可以肯定，对于您永远不会在特定像素上使用的许多尺寸计算它们是值得的，尤其是在某些尺寸很大的情况下。

附：这是一个小的添加，但您可能希望避免为每个像素调用 intersect()：或者 (a) 只处理距离边缘比最大积分大小更远的像素，或者 (b) 为图像添加边距所有边的最大整数大小，用零或nans填充边距，或（c）（最佳方法）使用切片自动处理此问题：ndarray边界外的切片索引自动限制为边界，除了当然，负索引是环绕的。

编辑：添加了滑动窗口总和的示例

【讨论】：

谢谢。在循环之前计算 3x3 窗口现在似乎很明显，感谢您的建议。我将调查滑动总和。我还在学习 python/numpy 等并且没有使用 strides，这给了我一个很好的理由。我做了一些计时并报告回来。再次感谢。
@Michael，我添加了滑动窗口总和的示例，看看并尝试一下。
@Michael：如果这对你有用，请记得接受答案 - 谢谢！
Opps...忘记接受（现已修复）。最初无法理解跨步技巧。你的例子有帮助。谢谢。