用 scipy 拟合具有固定均值的伽马分布？答案

【问题标题】：Fit gamma distribution with fixed mean with scipy?用 scipy 拟合具有固定均值的伽马分布？
【发布时间】：2021-02-15 17:48:54
【问题描述】：

scipy.stats.rv_continuous.fit，允许您在拟合分布时修复参数，但这取决于 scipy 对参数化的选择。对于 gamma 分布，它使用 k、theta（形状、比例）参数化，因此例如在保持 theta 常数的情况下很容易拟合。我想拟合一个我知道平均值的数据集，但观察到的平均值可能会因抽样误差而有所不同。如果 scipy 使用使用 mu = k*theta 而不是 theta 的参数化，这将很容易。有没有办法让 scipy 做到这一点？如果没有，还有其他图书馆可以吗？

以下是一些示例代码，其中数据集的观测平均值为 9.952，但我知道基础分布的实际平均值为 11：

from scipy.stats import gamma
observations = [17.6, 24.9, 3.9, 17.6, 11.8, 10.4, 4.1, 11.7, 5.7, 1.6, 
                8.6, 12.9, 5.7, 8.0, 7.4, 1.2, 11.3, 10.4, 1.0, 1.9, 
                6.0, 9.3, 13.3, 5.4, 9.1, 4.0, 12.8, 11.1, 23.1, 4.2, 
                7.9, 11.1, 10.0, 3.4, 27.8, 7.2, 14.9, 2.9, 5.5, 7.0, 
                3.9, 12.3, 10.6, 22.1, 5.0, 4.1, 21.3, 15.9, 34.5, 8.1, 
                19.6, 10.8, 13.4, 22.8, 27.6, 6.8, 5.9, 9.0, 7.1, 21.2, 
                1.0, 14.6, 16.9, 1.0, 6.5, 2.9, 7.1, 14.1, 15.2, 7.8, 
                9.0, 4.9, 2.1, 9.5, 5.6, 11.1, 7.7, 18.3, 3.8, 11.0, 
                4.2, 12.5, 8.4, 3.2, 4.0, 3.8, 2.0, 24.7, 24.6, 3.4, 
                4.3, 3.2, 7.6, 8.3, 14.5, 8.3, 8.4, 14.0, 1.0, 9.0]
shape, _, scale = gamma.fit(observations, floc = 0)
print(shape*scale)

这给了

9.952

但我想要一个适合shape*scale = 11.0

【问题讨论】：

method of moments 可以提供帮助。实现起来很简单，但如果您需要帮助，请告诉我。

标签： python scipy scipy.stats

【解决方案1】：

SciPy 分布的fit 方法提供了参数的最大似然估计。你是对的，它只提供了适合形状、位置和比例的功能。（其实你说的是形状和尺度，但是SciPy中还包含了一个位置参数，有时也叫三参数伽马分布。）

对于 SciPy 中的大多数分布，fit 方法使用数值优化器来最小化负对数似然，如nnlf 方法中所定义。无需使用fit 方法，您只需几行代码就可以自己完成。这允许您创建一个只有一个参数的目标函数，例如形状k，并在该函数内设置theta = mean/k，其中mean 是所需的平均值，并调用gamma.nnlf 来评估负对数-可能性。这是您可以做到的一种方法：

import numpy as np
from scipy.stats import gamma
from scipy.optimize import fmin


def nll(k, mean, x):
    return gamma.nnlf(np.array([k[0], 0, mean/k[0]]), x)


observations = [17.6, 24.9, 3.9, 17.6, 11.8, 10.4, 4.1, 11.7, 5.7, 1.6,
                8.6, 12.9, 5.7, 8.0, 7.4, 1.2, 11.3, 10.4, 1.0, 1.9,
                6.0, 9.3, 13.3, 5.4, 9.1, 4.0, 12.8, 11.1, 23.1, 4.2,
                7.9, 11.1, 10.0, 3.4, 27.8, 7.2, 14.9, 2.9, 5.5, 7.0,
                3.9, 12.3, 10.6, 22.1, 5.0, 4.1, 21.3, 15.9, 34.5, 8.1,
                19.6, 10.8, 13.4, 22.8, 27.6, 6.8, 5.9, 9.0, 7.1, 21.2,
                1.0, 14.6, 16.9, 1.0, 6.5, 2.9, 7.1, 14.1, 15.2, 7.8,
                9.0, 4.9, 2.1, 9.5, 5.6, 11.1, 7.7, 18.3, 3.8, 11.0,
                4.2, 12.5, 8.4, 3.2, 4.0, 3.8, 2.0, 24.7, 24.6, 3.4,
                4.3, 3.2, 7.6, 8.3, 14.5, 8.3, 8.4, 14.0, 1.0, 9.0]


# This is the desired mean of the distribution.
mean = 11

# Initial guess for the shape parameter.
k0 = 3.0
opt = fmin(nll, k0, args=(mean, np.array(observations)),
           xtol=1e-11, disp=False)

k_opt = opt[0]
theta_opt = mean / k_opt
print(f"k_opt:     {k_opt:9.7f}")
print(f"theta_opt: {theta_opt:9.7f}")

这个脚本打印

k_opt:     1.9712604
theta_opt: 5.5801861

或者，可以修改wikipedia 中显示的对数似然极值的一阶条件，以便只有一个参数k。然后可以将极值的条件实现为一个标量方程，其根可以通过scipy.optimize.fsolve 找到。以下是使用此技术的上述脚本的变体。

import numpy as np
from scipy.special import digamma
from scipy.optimize import fsolve


def first_order_eq(k, mean, x):
    mean_logx = np.mean(np.log(x))
    return (np.log(k) - digamma(k) + mean_logx - np.mean(x)/mean
            - np.log(mean) + 1)


observations = [17.6, 24.9, 3.9, 17.6, 11.8, 10.4, 4.1, 11.7, 5.7, 1.6,
                8.6, 12.9, 5.7, 8.0, 7.4, 1.2, 11.3, 10.4, 1.0, 1.9,
                6.0, 9.3, 13.3, 5.4, 9.1, 4.0, 12.8, 11.1, 23.1, 4.2,
                7.9, 11.1, 10.0, 3.4, 27.8, 7.2, 14.9, 2.9, 5.5, 7.0,
                3.9, 12.3, 10.6, 22.1, 5.0, 4.1, 21.3, 15.9, 34.5, 8.1,
                19.6, 10.8, 13.4, 22.8, 27.6, 6.8, 5.9, 9.0, 7.1, 21.2,
                1.0, 14.6, 16.9, 1.0, 6.5, 2.9, 7.1, 14.1, 15.2, 7.8,
                9.0, 4.9, 2.1, 9.5, 5.6, 11.1, 7.7, 18.3, 3.8, 11.0,
                4.2, 12.5, 8.4, 3.2, 4.0, 3.8, 2.0, 24.7, 24.6, 3.4,
                4.3, 3.2, 7.6, 8.3, 14.5, 8.3, 8.4, 14.0, 1.0, 9.0]


# This is the desired mean of the distribution.
mean = 11

# Initial guess for the shape parameter.
k0 = 3
sol = fsolve(first_order_eq, k0, args=(mean, observations),
             xtol=1e-11)

k_opt = sol[0]
theta_opt = mean / k_opt
print(f"k_opt:     {k_opt:9.7f}")
print(f"theta_opt: {theta_opt:9.7f}")

输出：

k_opt:     1.9712604
theta_opt: 5.5801861

【讨论】：