【问题标题】:Python: fitting curve to masked data with scipy curve_fitPython:使用 scipy curve_fit 将曲线拟合到屏蔽数据
【发布时间】:2020-07-07 16:54:47
【问题描述】:

我正在尝试使用 python/numpy/scipy 编写一个脚本,用于数据处理、拟合和绘制与角度相关的磁阻测量值。我是 Python 新手,从我的博士生导师那里得到了框架代码,并设法在框架中添加了几百行代码。过了一会儿,我注意到一些测量有多个错误,并且由于脚本应该自动完成所有操作,我试图掩盖这些点并将曲线拟合到未掩盖的点(曲线是叠加在线性函数上的正弦平方,所以 numpy.ma.polyfit 并不是一个真正的选择)。 但是,在屏蔽了问题点的 x 和 y 坐标后,拟合仍会考虑它们,即使它们不会显示在图中。这个例子被简化了,但同样的事情正在发生;

import numpy.ma as ma
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit



def Funk(x, k, y0):
 return k*x + y0   

fig,ax= plt.subplots()

x=ma.masked_array([1,2,3,4,5,6,7,8,9,10],mask=[0,0,0,0,0,0,1,1,1,1])
y=ma.masked_array([1,2,3,4,5,30,35,40,45,50], mask=[0,0,0,0,0,1,1,1,1,1])


fitParamsFunk, fitCovariancesFunk = curve_fit(Funk, x, y)

ax.plot(x, Funk(x, fitParamsFunk[0], fitParamsFunk[1]))
ax.errorbar(x, y, yerr = None, ms=3, fmt='-o')
plt.show()

The second half of the points is masked and not shown in the plot, but still taken into consideration.

在写这篇文章时,我发现我可以做到这一点:

def Funk(x, k, y0):
    return k*x + y0   

fig,ax= plt.subplots()

x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([1,2,3,4,5,30,35,40,45,50])
mask=np.array([0,0,0,0,0,1,1,1,1,1])

fitParamsFunk, fitCovariancesFunk = curve_fit(Funk, x[mask], y[mask])

ax.plot(x, Funk(x, fitParamsFunk[0], fitParamsFunk[1]))
ax.errorbar(x, y, yerr = None, ms=3, fmt='-o')
plt.show()

What I actually wanted

我猜 scipy curve_fit 并不是要处理掩码数组,但我仍然想知道是否有任何解决方法(我需要使用掩码数组,因为数据点的数量>10e6,但我一次只绘制 100 个,所以我需要获取要绘制的数组部分的掩码并将其分配给另一个数组,同时将数组的值复制到另一个数组或设置原始掩码为假)?感谢您的任何建议

【问题讨论】:

    标签: python numpy scipy curve-fitting masked-array


    【解决方案1】:

    如果只想考虑有效条目,可以使用掩码的倒数作为索引:

    x = ma.masked_array([1,2,3,4,5,6,7,8,9,10], mask=[0,0,0,0,0,1,1,1,1,1])  # changed mask
    y = ma.masked_array([1,2,3,4,5,30,35,40,45,50], mask=[0,0,0,0,0,1,1,1,1,1])
    
    fitParamsFunk, fitCovariancesFunk = curve_fit(Funk, x[~x.mask], y[~y.mask])
    

    PS:请注意,两个数组需要有相同数量的有效条目。

    【讨论】:

      【解决方案2】:

      在数值微积分中使用掩码相当于在解析微积分中使用 Heaviside 阶跃函数。例如,通过应用分段线性回归,这变得非常简单:

      它们是论文中分段线性回归的几个例子:https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf

      使用本文所示的方法,下面非常简单的演算得出预期的结果形式:

      注意:在大量点的情况下,如果在过渡区域中存在多个横坐标略有不同的点,则应用上面引用的论文第 29-31 页的案例会更准确。

      【讨论】:

        【解决方案3】:

        我认为您要做的是定义一个掩码,列出“良好数据点”的索引,然后将其用作拟合(和/或绘图)的点。

        作为 lmfit 的主要作者,我建议使用该库进行曲线拟合:它比curve_fit 具有许多有用的功能。这样,您的示例可能如下所示:

        import numpy as np
        import matplotlib.pyplot as plt
        from lmfit import Model
        
        def Funk(x, k, y0, good_points=None):  # note: add keyword argument
            f = k*x + y0
            if good_points is not None:
                f = f[good_points]       # apply mask of good data points
            return f
        
        x = np.array([1,2,3,4,5, 6,7,8.,9,10])
        y = np.array([1,2,3,4,5,30,35.,40,45,50]) 
        y += np.random.normal(size=len(x), scale=0.19) # add some noise to make it fun
        
        # make an array of the indices of the "good data points"
        # does not need to be contiguous.
        good_points=np.array([0,1,2,3,4])
        
        # turn your model function Funk into an lmfit Model
        mymodel = Model(Funk)
        
        # create parameters, giving initial values. Note that parameters are
        # named using the names of your function's argument and that keyword 
        # arguments with non-numeric defaults like 'good points' are seen to
        #  *not* be parameters. Like the independent variable `x`, you'll 
        # need to pass that in when you do the fit.
        # also: parameters can be fixed, or given `min` and `max` attributes
        
        params = mymodel.make_params(k=1.4,  y0=0.2)
        params['k'].min = 0
        
        # do the fit to the 'good data', passing in the parameters, the 
        # independent variable `x` and the `good_points` mask.
        result  = mymodel.fit(y[good_points], params, x=x, good_points=good_points)
        
        # print out a report of best fit values, uncertainties, correlations, etc.
        print(result.fit_report())
        
        # plot the results, again using the good_points array as needed.
        plt.plot(x, y, 'o', label='all data')
        plt.plot(x[good_points], result.best_fit[good_points], label='fit to good data')
        plt.legend()
        plt.show()
        

        这将打印出来

        [[Model]]
            Model(Funk)
        [[Fit Statistics]]
            # fitting method   = leastsq
            # function evals   = 7
            # data points      = 5
            # variables        = 2
            chi-square         = 0.02302999
            reduced chi-square = 0.00767666
            Akaike info crit   = -22.9019787
            Bayesian info crit = -23.6831029
        [[Variables]]
            k:   1.02460577 +/- 0.02770680 (2.70%) (init = 1.4)
            y0: -0.04135096 +/- 0.09189305 (222.23%) (init = 0.2)
        [[Correlations]] (unreported correlations are < 0.100)
            C(k, y0) = -0.905
        

        并产生一个情节

        希望能帮助您入门。

        【讨论】:

          猜你喜欢
          • 2020-06-12
          • 1970-01-01
          • 2016-11-29
          • 2020-05-05
          • 2020-07-12
          • 2021-09-25
          • 1970-01-01
          • 2014-09-08
          • 1970-01-01
          相关资源
          最近更新 更多