Python lmfit 给出非常小的卡方；如何缩放残差答案

【问题标题】：Python lmfit gives very small chi square; how to scale residualsPython lmfit 给出非常小的卡方；如何缩放残差
【发布时间】：2022-01-11 05:11:37
【问题描述】：

我的 NIR 光谱 (x,y) 文件不提供错误信息。我正在做一个黑体加幂律拟合，代码如下；根据生成的参数值和相应的图，它似乎可以正常工作。然而，卡方值非常小，如下面的示例所示。文档说应该正确缩放残差。执行此操作的确切步骤是什么？感谢您的帮助。

def bb(x, T, const):
    from scipy.constants import h,k,c
    x = 1e-6 * x 
    return const*2*h*c**2 / (x**5 * (np.exp(h*c / (x*k*T)) - 1)) 

def powerlaw(x,A,p):
    return A*x**p


mod= Model(bb) + Model(powerlaw)
pars  = mod.make_params(T=2000,const=2*1e-21,A=2*np.average(y),p=-1.0)                          
result = mod.fit(y,pars,x=x)
print((result.fit_report()))


#Parameters
T=    (result.params['T'].value)
const=(result.params['const'].value)
A=    (result.params['A'].value)
p=    (result.params['p'].value)
---------------------------------
 T:      1403.30461 +/- 4.19860373 (0.30%) 
line 67  [[Model]]
    (Model(bb) + Model(powerlaw))
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 69
    # data points      = 5362
    # variables        = 4
    chi-square         = 5.6981e-29
    reduced chi-square = 1.0635e-32
    Akaike info crit   = -394752.758
    Bayesian info crit = -394726.409
[[Variables]]
    T:      1403.30461 +/- 4.19860373 (0.30%) (init = 2000)
    const:  6.9272e-26 +/- 8.9056e-28 (1.29%) (init = 2e-21)
    A:      2.1975e-15 +/- 7.9268e-18 (0.36%) (init = 4.309166e-15)
    p:     -2.57314708 +/- 0.01807976 (0.70%) (init = -1)

【问题讨论】：

标签： python python-3.x chi-squared lmfit

【解决方案1】：

嗯，lmfit 文档唯一说明不确定性规模的地方，它确实说明了这是如何完成的。

见：https://lmfit.github.io/lmfit-py/fitting.html#uncertainties-in-variable-parameters-and-their-correlations

引用：

In principle, the scale of the uncertainties in the Parameters is closely
tied to the goodness-of-fit statistics chi-square and reduced chi-square
(chisqr and redchi). The standard errors or  uncertainties are those that
increase chi-square by 1. Since a “good fit” should have redchi of around 1,
this requires that the data uncertainties (and to some extent the sampling
of the N data points) is correct. Unfortunately, it is often not the case
that one has high-quality estimates of the data uncertainties (getting the
data is hard enough!). Because of this common situation, the uncertainties
reported and held in stderr are not those that increase chi-square by 1, but
those that increase chi-square by reduced chi-square. This is equivalent to
rescaling the uncertainty in the data such that reduced chi-square would be
1. To be clear, this rescaling is done by default because if reduced 
chi-square is far from 1, this rescaling often makes the reported uncertainties 
sensible, and if reduced chi-square is near 1 it does little harm. If you 
have good scaling of the data uncertainty and believe the scale of the
residual array is correct, this automatic rescaling can be turned off using
scale_covar=False.

【讨论】：

我对统计不是很了解。如图所示，我的拟合统计打印输出显示卡方 = 5.6981e-29 减少卡方 = 1.0635e-32 我已多次阅读引用的文本，但我不明白我应该做什么。如果您能指出我必须在代码中更改或添加的内容，那将是最有帮助的
对于适当缩放的卡方，您需要将残差（数据 - 模型）缩放到数据中的不确定性。如果您有不确定性值或数组，请使用model.fit(...., weight=1.0/eps_data)，其中eps_data 是数据中估计的不确定性。您没有提供足够的示例来给出更具体的答案。
@rbaer 使用weight=1./0.05 表示y 值的绝对不确定性为0.05，而不是“5%”。比如，你说的5%是什么意思？你的意思是 y 的逐点值的 5%？如果是这样，请明确声明：weight=20./y。可能想在那里检查被零除...
我试过 ''' result = mod.fit(y,pars,x=x, scale_covar=False, weight=1.0/0.05) ''' 以及 scale_covar= True 假设为 5%不确定;输出没有改变。如果您提供您的电子邮件地址，我可以将完整的代码（与我发布的内容相比）和数据文件发送给您。
我猜不出你为什么认为scale_covar=True 会与“5% 的不确定性”相关联。您需要包含一个最小的、完整的示例来显示您遇到的问题。如果您不想使用 SO，请阅读实际文档，包括如何获得帮助。答案与“为了让卡方具有正确的比例，您需要在数据中包含不确定性”没有什么不同。不是百分比或分数的不确定性 - 实际的不确定性。