如何计算二维直方图的总体积？答案

【问题标题】：How to calculate the total volume of a 2D Histogram?如何计算二维直方图的总体积？
【发布时间】：2021-05-25 03:29:05
【问题描述】：

抱歉，对于 Python 而言相对较新，尤其是对于将其用于统计目的的新手。我有两列从 excel 中读取的数据。我为每一列创建了 1D 直方图，并证明它们下面的区域等于 1，如下所示：

n, bins, _=plt.hist(thickness, 15, range=[0,8], density=True)
Area_T= sum(numpy.diff(bins)*n)

现在我想证明 2D 直方图的面积等于 1。我已经制作了 2D 直方图，只是不知道如何整合它，因为它返回一个 2D 数组。

h, xedges, yedges, _=plt.hist2d(thickness_data, height_data, bins=(20,20), density=True)

【问题讨论】：

标签： python numpy matplotlib histogram

【解决方案1】：

您可以通过将h 中的每个值乘以其对应的 bin 的宽度和高度来计算总体积：

import matplotlib.pyplot as plt
import numpy as np

h, xedges, yedges, _ = plt.hist2d(np.random.randn(1000).cumsum(), np.random.randn(1000).cumsum(), 
                                  bins=(20, 30), density=True)
total_volume = np.sum(h * np.diff(xedges).reshape(-1, 1) * np.diff(yedges).reshape(1, -1))
print("total_volume =", total_volume) # prints "total_volume = 1.0"

没有density=True的直方图体积是一个bin的大小乘以样本数。所有 bin 的宽度为xedges[-1]-xedges[0]。高度为yedges[-1]-yedges[0]。一个 bin 的面积是 all 的面积除以 bin 的数量（示例中为20*30=600）。

import matplotlib.pyplot as plt
import numpy as np

h, xedges, yedges, _ = plt.hist2d(np.random.randn(1000).cumsum(), np.random.randn(1000).cumsum(),
                                  bins=(20, 30), density=False)
total_volume = np.sum(h * np.diff(xedges).reshape(-1, 1) * np.diff(yedges).reshape(1, -1))
print("total volume :", total_volume)
print("   predicted :", (xedges[-1] - xedges[0]) * (yedges[-1] - yedges[0]) / 600 * 1000)

这打印例如：

total volume : 4057.2494712526022
   predicted : 4057.2494712526036

所以，只是一个很小的舍入误差。

【讨论】：