您正在绘制的“钟形曲线”是一个概率密度函数 (PDF)。这意味着具有该分布的随机变量落在任何区间 [a, b] 的概率是 a 之间的曲线下面积和b。因此曲线下的整个面积(从-infinity到+infinity)一定是1。所以当标准差较小时,PDF的最大值很可能大于1,这并不奇怪。
追问:第一张图的曲线下面积真的是1吗?
是的,是的。确认这一点的一种方法是通过计算一系列高度由曲线定义的矩形的总面积来近似曲线下的面积:
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import norm
import matplotlib.patches as patches
mean = 5
std = 0.25
x = np.linspace(4, 6, 1000)
y = norm(loc=mean, scale=std).pdf(x)
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_aspect('equal')
ax.set_xlim([4, 6])
ax.set_ylim([0, 1.7])
# Approximate area under the curve by summing over rectangles:
xlim_approx = [4, 6] # locations of left- and rightmost rectangle
n_approx = 17 # number of rectangles
# width of one rectangle:
width_approx = (xlim_approx[1] - xlim_approx[0]) / n_approx
# x-locations of rectangles:
x_approx = np.linspace(xlim_approx[0], xlim_approx[1], n_approx)
# heights of rectangles:
y_approx = norm(loc=mean, scale=std).pdf(x_approx)
# plot approximation rectangles:
for i, xi in enumerate(x_approx):
ax.add_patch(patches.Rectangle((xi - width_approx/2, 0), width_approx,
y_approx[i], facecolor='gray', alpha=.3))
# areas of the rectangles:
areas = y_approx * width_approx
# total area of the rectangles:
print(sum(areas))
0.9411599204607589
好的,这不是 1,但让我们通过扩展 x 限制和增加矩形的数量来获得更好的近似值:
xlim_approx = [0, 10]
n_approx = 100_000
width_approx = (xlim_approx[1] - xlim_approx[0]) / n_approx
x_approx = np.linspace(xlim_approx[0], xlim_approx[1], n_approx)
y_approx = norm(loc=mean, scale=std).pdf(x_approx)
areas = y_approx * width_approx
print(sum(areas))
0.9999899999999875