【问题标题】:Scipy - How to fit this beta distribution using Python Scipy Curve FitScipy - 如何使用 Python Scipy Curve Fit 拟合这个 beta 分布
【发布时间】:2020-10-19 23:25:43
【问题描述】:

我对@9​​87654323@ 和scipy 很陌生。我有很多看起来像 y 的发行版,看起来不像 y。大多数看起来像 y 的发行版都是 beta 发行版。我的方法是,如果我可以在所有具有不同分布的唯一 ID 上拟合 beta 函数,我可以从 beta 函数中找到系数,然后查看大小接近的系数,然后我可以有效地过滤掉所有分布看起来像y

y 看起来像这样(下面的示例代码中的数据相同):

但是,我在开始时遇到了一些麻烦。

y = array([[ 0.50423378,  0.50423378,  0.50423378,  0.50254455,  0.50423378, 0.50254455,  0.50423378,  0.50507627,  0.50507627,  0.50423378,0.50507627,  0.50507627,  0.50423378,  0.50423378,  0.50423378, 0.50423378,  0.50423378,  0.50423378,  0.50254455,  0.50254455, 0.50254455,  0.50423378,  0.50423378,  0.50507627,  0.50507627,0.50507627,  0.50507627,  0.50507627,  0.50423378,  0.50423378, 0.50423378,  0.50507627,  0.50507627,  0.50423378,  0.50507627, 0.50507627,  0.50507627,  0.50423378,  0.50423378,  0.50423378,0.50423378,  0.50423378,  0.50254455,  0.50254455,  0.5, 0.50254455,  0.50254455,  0.50254455,  0.50423378,  0.50423378,0.50423378,  0.50423378,  0.50423378,  0.50254455,  0.50423378, 0.50254455,  0.50254455,  0.50423378,  0.50423378,  0.50254455,0.5       ,  0.5       ,  0.50254455,  0.50254455,  0.5       ,0.49658699,  0.49228746,  0.49228746,  0.48707792,  0.48092881,0.48707792,  0.48092881,  0.48092881,  0.48092881,  0.48092881,0.48092881,  0.48092881,  0.47380354,  0.47380354,  0.48092881,0.48707792,  0.48707792,  0.48092881,  0.48092881,  0.48092881,0.48092881,  0.48092881,  0.48092881,  0.47380354,  0.48092881,0.48092881,  0.48092881,  0.48707792,  0.48707792,  0.48707792,0.49228746,  0.49228746,  0.49228746,  0.49228746,  0.48707792,0.48707792,  0.48707792,  0.49228746,  0.48707792,  0.48707792,0.48707792,  0.48707792,  0.48707792,  0.49228746,  0.49228746,0.48707792,  0.48707792,  0.49228746,  0.49658699,  0.49658699,0.49658699,  0.49228746,  0.49228746,  0.49658699,  0.49228746,0.49658699,  0.5       ,  0.50254455,  0.50423378,  0.50423378,0.50254455,  0.50423378,  0.50423378,  0.50254455,  0.5       ,0.5       ,  0.5       ,  0.5       ,  0.5       ,  0.50254455,0.50254455,  0.5       ,  0.50254455,  0.5       ,  0.5       ,0.5       ,  0.5       ,  0.5       ,  0.5       ,  0.49658699,0.49228746,  0.48707792,  0.48707792,  0.48707792,  0.49228746,0.49228746,  0.48707792,  0.48707792,  0.49228746,  0.48707792,0.48707792,  0.48707792,  0.48092881,  0.48092881,  0.48707792,0.48707792,  0.48092881,  0.47380354,  0.48092881,  0.48092881,0.48707792,  0.49228746,  0.48707792,  0.49228746,  0.48707792,0.48092881,  0.47380354,  0.46565731,  0.46565731,  0.46565731,0.45643546,  0.45643546,  0.45643546,  0.45643546,  0.45643546,0.45643546,  0.45643546,  0.46565731,  0.45643546,  0.45643546,0.45643546,  0.44607129,  0.45643546,  0.45643546,  0.45643546,0.44607129,  0.44607129,  0.43448304,  0.43448304,  0.43448304,0.44607129,  0.45643546,  0.45643546,  0.45643546,  0.46565731,0.47380354,  0.48092881,  0.48092881, 29.38186886, 29.38186886,29.38186886, 29.37898909, 29.45299206, 29.52449116, 29.74083063,29.73771398, 29.73771398, 29.74083063, 29.74083063, 29.74083063,29.74083063, 29.73771398, 29.74083063, 29.73771398, 29.73771398,29.73771398, 29.73771398, 29.74083063, 29.74083063, 29.74083063,30.12527698, 30.48367189, 30.8169243 , 30.8169243 , 30.8169243 ,30.8169243 , 30.82153203, 30.8169243 , 30.81230208, 30.81230208,30.80766536, 30.81230208, 30.81230208, 30.80766536, 30.80301414,30.80301414, 30.80301414, 30.80301414, 30.80301414, 30.80766536,30.81230208, 30.81230208, 30.81230208, 30.81230208, 30.8169243 ,30.82153203, 30.82612528, 10.51949923, 10.51949923, 10.51436497,10.51436497, 10.22456193,  9.91464422,  9.36922158,  9.37416663,9.36922158,  9.36922158,  9.36922158,  9.37416663,  9.37906375,9.383913  ,  9.383913  ,  9.38871446,  9.383913  ,  9.37906375,9.37416663,  9.36922158,  9.36422851,  9.35918734,  7.72711675,5.53121937,  0.5       ,  0.50254455,  0.50254455,  0.50254455,0.50254455,  0.50254455,  0.5       ,  0.5       ,  0.49658699,0.5       ,  0.5       ,  0.5       ,  0.49658699,  0.49658699,0.5       ,  0.50254455,  0.50423378,  0.50423378,  0.50423378,0.50507627,  0.50507627,  0.50423378,  0.50423378,  0.50423378,0.50423378,  0.50423378,  0.50254455,  0.50254455,  0.5       ,0.5       ,  0.5       ,  0.49658699,  0.5       ,  0.49658699,0.49658699,  0.49658699,  0.49658699,  0.49658699,  0.49658699,0.49658699,  0.49658699,  0.49228746,  0.48707792,  0.48707792,0.48092881,  0.47380354,  0.47380354,  0.46565731,  0.46565731,0.47380354,  0.46565731,  0.47380354,  0.47380354,  0.47380354, 0.47380354,  0.48092881]])

使用 scipy 中的这个示例,我如何获取 x 数组并将其插入以获取我的系数,然后在我的分布上绘制 curve_fit

import numpy as np
from scipy.optimize import curve_fit
from scipy.special import gamma as gamma

def betafunc(x,a,b,cst):
    return cst*gamma(a+b) * (x**(a-1)) * ((1-x)**(b-1))  / ( gamma(a)*gamma(b) )

x = np.array( [0.1, 0.3, 0.5, 0.7, 0.9, 1.1])
y = np.array( [0.45112234, 0.56934313, 0.3996803 , 0.28982859, 0.19682153, 0.] )

popt2,pcov2 = curve_fit(betafunc,x[:-1],y[:-1],p0=(0.5,1.5,0.5))

print(popt2)
print(pcov2)

【问题讨论】:

    标签: python numpy scipy


    【解决方案1】:

    对于您问题的第一部分: 如果您有一组观察结果,您可以使用 numpy.histogram 来获取直方图。要获得每个垃圾箱的中心,请按照下面的代码进行。您可以将这些值用于拟合过程。根据您提供的数据,谁不能适合 betafunc,因为它根本不适合。

    import numpy as np
    from matplotlib import pyplot as plt
    from scipy.optimize import curve_fit
    from scipy.special import gamma as gamma
    
    
    def betafunc(x,a,b,cst):
        return cst*gamma(a+b) * (x**(a-1)) * ((1-x)**(b-1))  / ( gamma(a)*gamma(b) )
    
    y_data=np.array([[ 0.50423378,  0.50423378,  0.50423378,  0.50254455,  0.50423378, 0.50254455,  0.50423378,  0.50507627,  0.50507627,  0.50423378,0.50507627,  0.50507627,  0.50423378,  0.50423378,  0.50423378, 0.50423378,  0.50423378,  0.50423378,  0.50254455,  0.50254455, 0.50254455,  0.50423378,  0.50423378,  0.50507627,  0.50507627,0.50507627,  0.50507627,  0.50507627,  0.50423378,  0.50423378, 0.50423378,  0.50507627,  0.50507627,  0.50423378,  0.50507627, 0.50507627,  0.50507627,  0.50423378,  0.50423378,  0.50423378,0.50423378,  0.50423378,  0.50254455,  0.50254455,  0.5, 0.50254455,  0.50254455,  0.50254455,  0.50423378,  0.50423378,0.50423378,  0.50423378,  0.50423378,  0.50254455,  0.50423378, 0.50254455,  0.50254455,  0.50423378,  0.50423378,  0.50254455,0.5       ,  0.5       ,  0.50254455,  0.50254455,  0.5       ,0.49658699,  0.49228746,  0.49228746,  0.48707792,  0.48092881,0.48707792,  0.48092881,  0.48092881,  0.48092881,  0.48092881,0.48092881,  0.48092881,  0.47380354,  0.47380354,  0.48092881,0.48707792,  0.48707792,  0.48092881,  0.48092881,  0.48092881,0.48092881,  0.48092881,  0.48092881,  0.47380354,  0.48092881,0.48092881,  0.48092881,  0.48707792,  0.48707792,  0.48707792,0.49228746,  0.49228746,  0.49228746,  0.49228746,  0.48707792,0.48707792,  0.48707792,  0.49228746,  0.48707792,  0.48707792,0.48707792,  0.48707792,  0.48707792,  0.49228746,  0.49228746,0.48707792,  0.48707792,  0.49228746,  0.49658699,  0.49658699,0.49658699,  0.49228746,  0.49228746,  0.49658699,  0.49228746,0.49658699,  0.5       ,  0.50254455,  0.50423378,  0.50423378,0.50254455,  0.50423378,  0.50423378,  0.50254455,  0.5       ,0.5       ,  0.5       ,  0.5       ,  0.5       ,  0.50254455,0.50254455,  0.5       ,  0.50254455,  0.5       ,  0.5       ,0.5       ,  0.5       ,  0.5       ,  0.5       ,  0.49658699,0.49228746,  0.48707792,  0.48707792,  0.48707792,  0.49228746,0.49228746,  0.48707792,  0.48707792,  0.49228746,  0.48707792,0.48707792,  0.48707792,  0.48092881,  0.48092881,  0.48707792,0.48707792,  0.48092881,  0.47380354,  0.48092881,  0.48092881,0.48707792,  0.49228746,  0.48707792,  0.49228746,  0.48707792,0.48092881,  0.47380354,  0.46565731,  0.46565731,  0.46565731,0.45643546,  0.45643546,  0.45643546,  0.45643546,  0.45643546,0.45643546,  0.45643546,  0.46565731,  0.45643546,  0.45643546,0.45643546,  0.44607129,  0.45643546,  0.45643546,  0.45643546,0.44607129,  0.44607129,  0.43448304,  0.43448304,  0.43448304,0.44607129,  0.45643546,  0.45643546,  0.45643546,  0.46565731,0.47380354,  0.48092881,  0.48092881, 29.38186886, 29.38186886,29.38186886, 29.37898909, 29.45299206, 29.52449116, 29.74083063,29.73771398, 29.73771398, 29.74083063, 29.74083063, 29.74083063,29.74083063, 29.73771398, 29.74083063, 29.73771398, 29.73771398,29.73771398, 29.73771398, 29.74083063, 29.74083063, 29.74083063,30.12527698, 30.48367189, 30.8169243 , 30.8169243 , 30.8169243 ,30.8169243 , 30.82153203, 30.8169243 , 30.81230208, 30.81230208,30.80766536, 30.81230208, 30.81230208, 30.80766536, 30.80301414,30.80301414, 30.80301414, 30.80301414, 30.80301414, 30.80766536,30.81230208, 30.81230208, 30.81230208, 30.81230208, 30.8169243 ,30.82153203, 30.82612528, 10.51949923, 10.51949923, 10.51436497,10.51436497, 10.22456193,  9.91464422,  9.36922158,  9.37416663,9.36922158,  9.36922158,  9.36922158,  9.37416663,  9.37906375,9.383913  ,  9.383913  ,  9.38871446,  9.383913  ,  9.37906375,9.37416663,  9.36922158,  9.36422851,  9.35918734,  7.72711675,5.53121937,  0.5       ,  0.50254455,  0.50254455,  0.50254455,0.50254455,  0.50254455,  0.5       ,  0.5       ,  0.49658699,0.5       ,  0.5       ,  0.5       ,  0.49658699,  0.49658699,0.5       ,  0.50254455,  0.50423378,  0.50423378,  0.50423378,0.50507627,  0.50507627,  0.50423378,  0.50423378,  0.50423378,0.50423378,  0.50423378,  0.50254455,  0.50254455,  0.5       ,0.5       ,  0.5       ,  0.49658699,  0.5       ,  0.49658699,0.49658699,  0.49658699,  0.49658699,  0.49658699,  0.49658699,0.49658699,  0.49658699,  0.49228746,  0.48707792,  0.48707792,0.48092881,  0.47380354,  0.47380354,  0.46565731,  0.46565731,0.47380354,  0.46565731,  0.47380354,  0.47380354,  0.47380354, 0.47380354,  0.48092881]])
    
    
    hist=np.histogram(y_data[0],bins=20)
    x=(hist[1][1:]+hist[1][:-1])/2
    y=hist[0]
    
    print(x,y)
    
    plt.step(x,y,label='Manual calculation of the center of the bins')
    plt.hist(y_data[0],bins=20,histtype='bar',label='Automatic plot with plt.hist')
    plt.legend()
    plt.show()
    
    popt2,pcov2 = curve_fit(betafunc,x[:-1],y[:-1],p0=(0.5,1.5,0.5))
    

    关于你问题的第二部分: 要绘制具有最佳拟合参数的函数,您只需添加我在末尾添加的最后四行代码。

    import numpy as np
    from scipy.optimize import curve_fit
    from scipy.special import gamma as gamma
    
    
    def betafunc(x,a,b,cst):
        return cst*gamma(a+b) * (x**(a-1)) * ((1-x)**(b-1))  / ( gamma(a)*gamma(b) )
    
    
    
    x = np.array( [0.1, 0.3, 0.5, 0.7, 0.9, 1.1])
    y = np.array( [0.45112234, 0.56934313, 0.3996803 , 0.28982859, 0.19682153, 0.] )
    
    popt2,pcov2 = curve_fit(betafunc,x[:-1],y[:-1],p0=(0.5,1.5,0.5))
    
    print(popt2)
    print(pcov2)
    
    from matplotlib import pyplot as plt
    plt.plot(x,betafunc(x,*popt2))
    plt.plot(x,y)
    plt.show()
    

    【讨论】:

    • 我不确定合并数据是否正确:任何聚合都包含对数据的任意修改。使用两种不同数量的垃圾箱,您可能会得到两种不同的配合。这是一种为已解释变量拟合模型的方法,但 distribution 拟合的方法不同:我喜欢 overview in Wikipedia
    【解决方案2】:

    如果您不受限制使用curve_fit。我建议你看看scipy.stats.beta。一种可能的解决方案是:

    from scipy.stats import beta
    
    y = array([[ 0.50423378,  0.50423378,  0.50423378,  0.50254455,  0.50423378, 0.50254455,  0.50423378,  0.50507627,  0.50507627,  0.50423378,0.50507627,  0.50507627,  0.50423378,  0.50423378,  0.50423378, 0.50423378,  0.50423378,  0.50423378,  0.50254455,  0.50254455, 0.50254455,  0.50423378,  0.50423378,  0.50507627,  0.50507627,0.50507627,  0.50507627,  0.50507627,  0.50423378,  0.50423378, 0.50423378,  0.50507627,  0.50507627,  0.50423378,  0.50507627, 0.50507627,  0.50507627,  0.50423378,  0.50423378,  0.50423378,0.50423378,  0.50423378,  0.50254455,  0.50254455,  0.5, 0.50254455,  0.50254455,  0.50254455,  0.50423378,  0.50423378,0.50423378,  0.50423378,  0.50423378,  0.50254455,  0.50423378, 0.50254455,  0.50254455,  0.50423378,  0.50423378,  0.50254455,0.5       ,  0.5       ,  0.50254455,  0.50254455,  0.5       ,0.49658699,  0.49228746,  0.49228746,  0.48707792,  0.48092881,0.48707792,  0.48092881,  0.48092881,  0.48092881,  0.48092881,0.48092881,  0.48092881,  0.47380354,  0.47380354,  0.48092881,0.48707792,  0.48707792,  0.48092881,  0.48092881,  0.48092881,0.48092881,  0.48092881,  0.48092881,  0.47380354,  0.48092881,0.48092881,  0.48092881,  0.48707792,  0.48707792,  0.48707792,0.49228746,  0.49228746,  0.49228746,  0.49228746,  0.48707792,0.48707792,  0.48707792,  0.49228746,  0.48707792,  0.48707792,0.48707792,  0.48707792,  0.48707792,  0.49228746,  0.49228746,0.48707792,  0.48707792,  0.49228746,  0.49658699,  0.49658699,0.49658699,  0.49228746,  0.49228746,  0.49658699,  0.49228746,0.49658699,  0.5       ,  0.50254455,  0.50423378,  0.50423378,0.50254455,  0.50423378,  0.50423378,  0.50254455,  0.5       ,0.5       ,  0.5       ,  0.5       ,  0.5       ,  0.50254455,0.50254455,  0.5       ,  0.50254455,  0.5       ,  0.5       ,0.5       ,  0.5       ,  0.5       ,  0.5       ,  0.49658699,0.49228746,  0.48707792,  0.48707792,  0.48707792,  0.49228746,0.49228746,  0.48707792,  0.48707792,  0.49228746,  0.48707792,0.48707792,  0.48707792,  0.48092881,  0.48092881,  0.48707792,0.48707792,  0.48092881,  0.47380354,  0.48092881,  0.48092881,0.48707792,  0.49228746,  0.48707792,  0.49228746,  0.48707792,0.48092881,  0.47380354,  0.46565731,  0.46565731,  0.46565731,0.45643546,  0.45643546,  0.45643546,  0.45643546,  0.45643546,0.45643546,  0.45643546,  0.46565731,  0.45643546,  0.45643546,0.45643546,  0.44607129,  0.45643546,  0.45643546,  0.45643546,0.44607129,  0.44607129,  0.43448304,  0.43448304,  0.43448304,0.44607129,  0.45643546,  0.45643546,  0.45643546,  0.46565731,0.47380354,  0.48092881,  0.48092881, 29.38186886, 29.38186886,29.38186886, 29.37898909, 29.45299206, 29.52449116, 29.74083063,29.73771398, 29.73771398, 29.74083063, 29.74083063, 29.74083063,29.74083063, 29.73771398, 29.74083063, 29.73771398, 29.73771398,29.73771398, 29.73771398, 29.74083063, 29.74083063, 29.74083063,30.12527698, 30.48367189, 30.8169243 , 30.8169243 , 30.8169243 ,30.8169243 , 30.82153203, 30.8169243 , 30.81230208, 30.81230208,30.80766536, 30.81230208, 30.81230208, 30.80766536, 30.80301414,30.80301414, 30.80301414, 30.80301414, 30.80301414, 30.80766536,30.81230208, 30.81230208, 30.81230208, 30.81230208, 30.8169243 ,30.82153203, 30.82612528, 10.51949923, 10.51949923, 10.51436497,10.51436497, 10.22456193,  9.91464422,  9.36922158,  9.37416663,9.36922158,  9.36922158,  9.36922158,  9.37416663,  9.37906375,9.383913  ,  9.383913  ,  9.38871446,  9.383913  ,  9.37906375,9.37416663,  9.36922158,  9.36422851,  9.35918734,  7.72711675,5.53121937,  0.5       ,  0.50254455,  0.50254455,  0.50254455,0.50254455,  0.50254455,  0.5       ,  0.5       ,  0.49658699,0.5       ,  0.5       ,  0.5       ,  0.49658699,  0.49658699,0.5       ,  0.50254455,  0.50423378,  0.50423378,  0.50423378,0.50507627,  0.50507627,  0.50423378,  0.50423378,  0.50423378,0.50423378,  0.50423378,  0.50254455,  0.50254455,  0.5       ,0.5       ,  0.5       ,  0.49658699,  0.5       ,  0.49658699,0.49658699,  0.49658699,  0.49658699,  0.49658699,  0.49658699,0.49658699,  0.49658699,  0.49228746,  0.48707792,  0.48707792,0.48092881,  0.47380354,  0.47380354,  0.46565731,  0.46565731,0.47380354,  0.46565731,  0.47380354,  0.47380354,  0.47380354, 0.47380354,  0.48092881]]) 
    
    params = beta.fit(y)
    
    y2 = np.loadtxt("other_data_file.dat")   # other distribution file
    params2 = beta.fit(y2)
    

    然后您可以通过比较paramsparams2 来单独比较参数。请注意scipy.stats.beta在定义概率密度函数时使用了标准化形式。

    【讨论】:

      猜你喜欢
      • 2016-12-25
      • 2011-02-23
      • 2013-07-03
      • 2011-03-15
      • 2019-07-01
      • 2021-05-13
      • 1970-01-01
      • 1970-01-01
      • 2011-10-01
      相关资源
      最近更新 更多