【问题标题】:Prediction Intervals for Hyperbolic Curve_Fit - SciPy双曲曲线_拟合的预测区间 - SciPy
【发布时间】:2020-01-24 01:51:28
【问题描述】:

如何使用 SciPy 的曲线拟合函数获得预测区间/预测波段?

更具体地说,如何获得通常用于下降曲线分析的双曲线的这些预测带?

任何帮助将不胜感激。

import pandas as pd
import numpy as np
from datetime import timedelta
from scipy.optimize import curve_fit

def hyperbolic_equation(t, qi, b, di):
    return qi/((1.0+b*di*t)**(1.0/b))


df1 = pd.DataFrame({ 'cumsum_days': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
        'prod': [800, 900, 1200, 700, 600, 
                 550, 500, 650, 625, 600,
                 550, 525, 500, 400, 350]})

qi = max(df1['prod'])

#Hyperbolic curve fit the data to get best fit equation
popt_hyp, pcov_hyp = curve_fit(hyperbolic_equation, df1['cumsum_days'], df1['prod'],bounds=(0, [qi,1,20]))

#Passing t to estimate the coefficients:

def fitted_hyperbolic_equation(t):
    return popt_hyp[0]/((1.0+popt_hyp[1]*popt_hyp[2]*t)**(1.0/popt_hyp[1]))

#Creating future time to predict on:
df2 = pd.DataFrame({ 'future_days': [16,17,18,19,20]})

fitted_hyperbolic_equation(df2.future_days)

16    388.259631
17    368.389649
18    349.754534
19    332.264306
20    315.836485

我有自己的未来值,但如何使用 SciPy 生成置信度/预测带 (95%)?任何帮助将不胜感激。

【问题讨论】:

    标签: python scipy curve-fitting confidence-interval scipy-optimize


    【解决方案1】:

    我不确定我是否完全理解,但我认为您是在要求曲线拟合模型的预测值存在不确定性。

    我建议为此使用lmfit(免责声明:我是作者),因为它提供了进行此类计算的方法。恐怕你的模型和数据匹配的不是很好,所以不确定性很大

    使用lmfit 并使用普通的numpy 数组而不是pandas 数据帧(这些可以使用,但在这里它们会分散注意力 - 适合确实需要numpy 数组),您的分析可能看起来像这样:

    import numpy as np
    from lmfit import Model
    import matplotlib.pyplot as plt
    
    cumsum_days = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
    prod = np.array([800, 900, 1200, 700, 600, 550, 500, 650, 625, 600, 550,
                     525, 500, 400, 350])
    
    # plot data
    plt.plot(cumsum_days, prod, 'bo', label='data')
    
    def hyperbolic_equation(t, qi, b, di):
        return qi/((1.0+b*di*t)**(1.0/max(b, 1.e-50)))
    
    # build Model
    hmodel = Model(hyperbolic_equation)
    
    # create lmfit Parameters, named from the arguments of `hyperbolic_equation`
    # note that you really must provide initial values.
    params = hmodel.make_params(qi=1000, b=0.5, di=0.1)
    
    # set bounds on parameters
    params['qi'].min=0
    params['b'].min=0
    params['di'].min=0
    
    # do fit, print resulting parameters
    result = hmodel.fit(prod, params, t=cumsum_days)
    print(result.fit_report())
    
    # plot best fit: not that great of fit, really
    plt.plot(cumsum_days, result.best_fit, 'r--', label='fit')
    
    # calculate the (1 sigma) uncertainty in the predicted model
    # and plot that as a confidence band
    dprod = result.eval_uncertainty(result.params, sigma=1)   
    plt.fill_between(cumsum_days,
                     result.best_fit-dprod,
                     result.best_fit+dprod,
                     color="#AB8888",
                     label='uncertainty band of fit')
    
    # now evaluate the model for other values, predicting future values
    future_days = np.array([16,17,18,19,20])
    future_prod = result.eval(t=future_days)
    
    plt.plot(future_days, future_prod, 'k--', label='prediction')
    
    # ...and calculate the 1-sigma uncertainty in the future prediction
    # for 95% confidence level, you'd want to use `sigma=2` here:
    future_dprod = result.eval_uncertainty(t=future_days, sigma=1)
    
    print("### Prediction\n# Day  Prod     Uncertainty")
    
    for day, prod, eps in zip(future_days, future_prod, future_dprod):
        print(" {:.1f}   {:.1f} +/- {:.1f}".format(day, prod, eps))
    
    plt.fill_between(future_days,
                     future_prod-future_dprod,
                     future_prod+future_dprod,
                     color="#ABABAB",
                     label='uncertainty band of prediction')
    
    plt.legend(loc='lower left')
    plt.show()
    

    这将打印出结果拟合统计和参数值

    [[Model]]
        Model(hyperbolic_equation)
    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 21
        # data points      = 15
        # variables        = 3
        chi-square         = 238946.482
        reduced chi-square = 19912.2068
        Akaike info crit   = 151.139170
        Bayesian info crit = 153.263321
    [[Variables]]
        qi:  993.608482 +/- 163.710950 (16.48%) (init = 1000)
        b:   0.22855837 +/- 2.07615175 (908.37%) (init = 0.5)
        di:  0.06551315 +/- 0.06250023 (95.40%) (init = 0.1)
    [[Correlations]] (unreported correlations are < 0.100)
        C(b, di)  =  0.963
        C(qi, di) =  0.888
        C(qi, b)  =  0.771
    ### Prediction
    # Day  Prod     Uncertainty
     16.0   388.258 +/- 1080.106
     17.0   368.387 +/- 1106.336
     18.0   349.752 +/- 1130.091
     19.0   332.261 +/- 1151.634
     20.0   315.833 +/- 1171.196
    

    并给出这样的情节:

    在您的问题中,您没有通过统计或图形方式检查拟合质量。真的,你会想要这样做。

    您还使用了curve_fit,但未提供初始值。尽管没有任何基础拟合例程会支持这一点并且都需要明确的初始值,但curve_fit 允许这样做而没有警告或理由,并断言所有起始值都是1.0。确实,您必须提供初始值。

    【讨论】:

      猜你喜欢
      • 2017-04-21
      • 2018-11-20
      • 1970-01-01
      • 2023-03-26
      • 1970-01-01
      • 2013-10-10
      • 2020-05-05
      • 2021-11-01
      • 2016-11-29
      相关资源
      最近更新 更多