Numpy 直方图 - Python答案

【问题标题】：Numpy Histogram - PythonNumpy 直方图 - Python
【发布时间】：2014-01-07 11:40:14
【问题描述】：

我有一个问题，其中有一堆图像，我必须为其生成直方图。但我必须为每个像素生成一个直方图。即，对于 n 个图像的集合，我必须计算像素 0,0 假定的值并生成直方图，对于 0,1、0,2 等也是如此。我编写了以下方法来做到这一点：

class ImageData:
    def generate_pixel_histogram(self, images, bins):
    """
    Generate a histogram of the image for each pixel, counting
    the values assumed for each pixel in a specified bins
    """
        max_value = 0.0
        min_value = 0.0
        for i in range(len(images)):
            image = images[i]
            max_entry = max(max(p[1:]) for p in image.data)
            min_entry = min(min(p[1:]) for p in image.data)
            if max_entry > max_value:
                max_value = max_entry
            if min_entry < min_value:
                min_value = min_entry

        interval_size = (math.fabs(min_value) + math.fabs(max_value))/bins

        for x in range(self.width):
            for y in range(self.height):
                pixel_histogram = {}
                for i in range(bins+1):
                    key = round(min_value+(i*interval_size), 2)
                    pixel_histogram[key] = 0.0
                for i in range(len(images)):
                    image = images[i]
                    value = round(Utils.get_bin(image.data[x][y], interval_size), 2)
                    pixel_histogram[value] += 1.0/len(images)
                self.data[x][y] = pixel_histogram

矩阵的每个位置存储一个表示直方图的字典。但是，我如何为每个像素执行此操作，并且此演算需要相当长的时间，在我看来，这似乎是一个可以并行化的好问题。但我没有这方面的经验，也不知道该怎么做。

编辑：

我尝试了@Eelco Hoogendoorn 告诉我的方法，效果很好。但是将它应用到我的代码中，其中数据是使用此构造函数生成的大量图像（在计算值之后不再只是 0），我只是得到了一个零数组 [0 0 0]。我传递给 histogram 方法的是一个 ImageData 数组。

class ImageData(object):

    def __init__(self, width=5, height=5, range_min=-1, range_max=1):
        """
        The ImageData constructor
        """
        self.width = width
        self.height = height
        #The values range each pixel can assume
        self.range_min = range_min
        self.range_max = range_max
        self.data = np.arange(width*height).reshape(height, width)

#Another class, just the method here
def generate_pixel_histogram(realizations, bins):
    """
    Generate a histogram of the image for each pixel, counting
    the values assumed for each pixel in a specified bins
    """
    data = np.array([image.data for image in realizations])
    min_max_range = data.min(), data.max()+1

    bin_boundaries = np.empty(bins+1)

    # Function to wrap np.histogram, passing on only the first return value
    def hist(pixel):
        h, b = np.histogram(pixel, bins=bins, range=min_max_range)
        bin_boundaries[:] = b
        return h

    # Apply this for each pixel
    hist_data = np.apply_along_axis(hist, 0, data)
    print hist_data
    print bin_boundaries

现在我明白了：

  hist_data = np.apply_along_axis(hist, 0, data)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 104, in apply_along_axis
  outshape[axis] = len(res)
  TypeError: object of type 'NoneType' has no len()

任何帮助将不胜感激。提前致谢。

【问题讨论】：

标签： python class numpy histogram

【解决方案1】：

正如 john 所说，最明显的解决方案是寻找可以为您执行此操作的库功能。它存在，而且它的效率将比你在这里做的要高几个数量级。

标准 numpy 有一个直方图函数可用于此目的。如果每个像素只有几个值，效率会相对低下；它会创建一个密集的直方图向量，而不是您在此处生成的稀疏向量。不过，下面的代码很有可能有效地解决您的问题。

import numpy as np
#some example data; 128 images of 4x4 pixels
voxeldata = np.random.randint(0,100, (128, 4,4))
#we need to apply the same binning range to each pixel to get sensibble output
globalminmax = voxeldata.min(), voxeldata.max()+1
#number of output bins
bins = 20
bin_boundaries = np.empty(bins+1)
#function to wrap np.histogram, passing on only the first return value
def hist(pixel):
    h, b = np.histogram(pixel, bins=bins, range=globalminmax)
    bin_boundaries[:] = b  #simply overwrite; result should be identical each time
    return h
#apply this for each pixel
histdata = np.apply_along_axis(hist, 0, voxeldata)
print bin_boundaries
print histdata[:,0,0]  #print the histogram of an arbitrary pixel

但更一般的消息 id 喜欢传达，查看您的代码示例和您正在处理的问题类型：帮自己一个忙，并学习 numpy。

【讨论】：

我编辑了代码，使得 np.histogram 返回的 bin 边界也被存储；这可能会更清楚发生了什么
另外，请注意，您还可以明确指定分箱边界，而不仅仅是规则间隔的分箱数。这也将免除您必须捕获此输出的责任。
但至于一种微创方法，可以让您当前的设置与 numpy 很好地配合； voxeldata = np.array([image.data for image in implementations]) 应该可以解决问题。这应该将您的三重嵌套列表复制到适合与 numpy 一起使用的 3d 数组中。但同样，真正的问题是，为什么你的数据一开始就不在那个 3d 数组中，因为我怀疑这将是你第一次或最后一次想要对它进行任何有效的处理。
如果您的第三轴非常动态，将数据保留为二维数组列表可能确实更好。与转换三重嵌套列表的高成本相比，在需要时将其转换为 3d 数组的成本也微不足道。后者需要拆箱所有 xyz python 对象，而前者大多只是几块连续内存的 memcopy。
最新的是 1.8，但我怀疑这就是问题所在。不过升级也无妨。您是否真的将我的“return h”编辑传播到您正在运行的代码？如果该错误仍然相关，则听起来不像

【解决方案2】：

并行化当然不会是我优化这类事情的第一站。您的主要问题是您在 Python 级别进行了大量循环。 Python 在这种事情上天生就很慢。一种选择是学习如何编写 Cython 扩展并在 Cython 中编写直方图位。这可能需要你一段时间。实际上，获取像素值的直方图是计算机视觉中非常常见的任务，并且它已经在 OpenCV（具有 python 包装器）中有效地实现。在 numpy python 包中还有几个用于获取直方图的函数（尽管它们比 OpenCV 实现慢）。

【讨论】：