Python：使用 scipy curve_fit 将曲线拟合到屏蔽数据答案

【问题标题】：Python: fitting curve to masked data with scipy curve_fitPython：使用 scipy curve_fit 将曲线拟合到屏蔽数据
【发布时间】：2020-07-07 16:54:47
【问题描述】：

我正在尝试使用 python/numpy/scipy 编写一个脚本，用于数据处理、拟合和绘制与角度相关的磁阻测量值。我是 Python 新手，从我的博士生导师那里得到了框架代码，并设法在框架中添加了几百行代码。过了一会儿，我注意到一些测量有多个错误，并且由于脚本应该自动完成所有操作，我试图掩盖这些点并将曲线拟合到未掩盖的点（曲线是叠加在线性函数上的正弦平方，所以 numpy.ma.polyfit 并不是一个真正的选择）。但是，在屏蔽了问题点的 x 和 y 坐标后，拟合仍会考虑它们，即使它们不会显示在图中。这个例子被简化了，但同样的事情正在发生；

import numpy.ma as ma
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit



def Funk(x, k, y0):
 return k*x + y0   

fig,ax= plt.subplots()

x=ma.masked_array([1,2,3,4,5,6,7,8,9,10],mask=[0,0,0,0,0,0,1,1,1,1])
y=ma.masked_array([1,2,3,4,5,30,35,40,45,50], mask=[0,0,0,0,0,1,1,1,1,1])


fitParamsFunk, fitCovariancesFunk = curve_fit(Funk, x, y)

ax.plot(x, Funk(x, fitParamsFunk[0], fitParamsFunk[1]))
ax.errorbar(x, y, yerr = None, ms=3, fmt='-o')
plt.show()

The second half of the points is masked and not shown in the plot, but still taken into consideration.

在写这篇文章时，我发现我可以做到这一点：

def Funk(x, k, y0):
    return k*x + y0   

fig,ax= plt.subplots()

x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([1,2,3,4,5,30,35,40,45,50])
mask=np.array([0,0,0,0,0,1,1,1,1,1])

fitParamsFunk, fitCovariancesFunk = curve_fit(Funk, x[mask], y[mask])

ax.plot(x, Funk(x, fitParamsFunk[0], fitParamsFunk[1]))
ax.errorbar(x, y, yerr = None, ms=3, fmt='-o')
plt.show()

What I actually wanted

我猜 scipy curve_fit 并不是要处理掩码数组，但我仍然想知道是否有任何解决方法（我需要使用掩码数组，因为数据点的数量>10e6，但我一次只绘制 100 个，所以我需要获取要绘制的数组部分的掩码并将其分配给另一个数组，同时将数组的值复制到另一个数组或设置原始掩码为假）？感谢您的任何建议

【问题讨论】：

标签： python numpy scipy curve-fitting masked-array

【解决方案1】：

如果只想考虑有效条目，可以使用掩码的倒数作为索引：

x = ma.masked_array([1,2,3,4,5,6,7,8,9,10], mask=[0,0,0,0,0,1,1,1,1,1])  # changed mask
y = ma.masked_array([1,2,3,4,5,30,35,40,45,50], mask=[0,0,0,0,0,1,1,1,1,1])

fitParamsFunk, fitCovariancesFunk = curve_fit(Funk, x[~x.mask], y[~y.mask])

PS：请注意，两个数组需要有相同数量的有效条目。

【讨论】：

【解决方案2】：

在数值微积分中使用掩码相当于在解析微积分中使用 Heaviside 阶跃函数。例如，通过应用分段线性回归，这变得非常简单：

它们是论文中分段线性回归的几个例子：https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf

使用本文所示的方法，下面非常简单的演算得出预期的结果形式：

注意：在大量点的情况下，如果在过渡区域中存在多个横坐标略有不同的点，则应用上面引用的论文第 29-31 页的案例会更准确。

【讨论】：

【解决方案3】：

我认为您要做的是定义一个掩码，列出“良好数据点”的索引，然后将其用作拟合（和/或绘图）的点。

作为 lmfit 的主要作者，我建议使用该库进行曲线拟合：它比curve_fit 具有许多有用的功能。这样，您的示例可能如下所示：

import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model

def Funk(x, k, y0, good_points=None):  # note: add keyword argument
    f = k*x + y0
    if good_points is not None:
        f = f[good_points]       # apply mask of good data points
    return f

x = np.array([1,2,3,4,5, 6,7,8.,9,10])
y = np.array([1,2,3,4,5,30,35.,40,45,50]) 
y += np.random.normal(size=len(x), scale=0.19) # add some noise to make it fun

# make an array of the indices of the "good data points"
# does not need to be contiguous.
good_points=np.array([0,1,2,3,4])

# turn your model function Funk into an lmfit Model
mymodel = Model(Funk)

# create parameters, giving initial values. Note that parameters are
# named using the names of your function's argument and that keyword 
# arguments with non-numeric defaults like 'good points' are seen to
#  *not* be parameters. Like the independent variable `x`, you'll 
# need to pass that in when you do the fit.
# also: parameters can be fixed, or given `min` and `max` attributes

params = mymodel.make_params(k=1.4,  y0=0.2)
params['k'].min = 0

# do the fit to the 'good data', passing in the parameters, the 
# independent variable `x` and the `good_points` mask.
result  = mymodel.fit(y[good_points], params, x=x, good_points=good_points)

# print out a report of best fit values, uncertainties, correlations, etc.
print(result.fit_report())

# plot the results, again using the good_points array as needed.
plt.plot(x, y, 'o', label='all data')
plt.plot(x[good_points], result.best_fit[good_points], label='fit to good data')
plt.legend()
plt.show()

这将打印出来

[[Model]]
    Model(Funk)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 7
    # data points      = 5
    # variables        = 2
    chi-square         = 0.02302999
    reduced chi-square = 0.00767666
    Akaike info crit   = -22.9019787
    Bayesian info crit = -23.6831029
[[Variables]]
    k:   1.02460577 +/- 0.02770680 (2.70%) (init = 1.4)
    y0: -0.04135096 +/- 0.09189305 (222.23%) (init = 0.2)
[[Correlations]] (unreported correlations are < 0.100)
    C(k, y0) = -0.905

并产生一个情节

希望能帮助您入门。

【讨论】：