如何将曲线拟合到阶跃函数？答案

【问题标题】：How Can fit a curve to step function?如何将曲线拟合到阶跃函数？
【发布时间】：2019-06-15 04:22:01
【问题描述】：

我正在尝试将曲线拟合到阶跃函数。我试过没有。使用 sigmoid 函数、使用多项式比率、将高斯函数拟合到阶跃导数等方法，但没有一个看起来不错。现在，我想出了创建一个完美步长并计算完美步长到高斯函数的卷积并使用非线性回归找到最佳拟合参数的想法。但这也不好看。我在这里为 Sigmoid 和卷积方法编写代码。首先使用 Sigmoid 函数拟合：

功能：

function d=fit_sig(param,x,y)
a=param(1);
b=param(2);
d=(a./(1+exp(-b*x)))-y;
end

主要代码：

a=1, b=0.09;
p0=[a,b];
sig_coff=lsqnonlin(@fit_sig,p0,[],[],[],xavg_40s1,havg_40s1);
% Plot the original and experimental data.
sig_new = sig_coff(1)./(1+exp(-sig_coff(2)*xavg_40s1));
d= havg_40s1-step_new;
figure;
plot(xavg_40s1,havg_40s1,'.-r',xavg_40s1,sig_new,'.-b');
xlabel('x-pixel'); ylabel('dz/dx (mm/pixel)'); axis square;

这根本不起作用。我认为我最初的猜测是错误的。我尝试了多个数字，但无法正确。我也尝试过使用曲线拟合工具，但这也不起作用。

创建完美步骤的代码：

h=ones(1,numel(havg_40s1)); %height=1mm
h(1:81)=-0.038;
h(82:end)=1.002; %or 1.0143
figure; 
plot(xavg_40s1,havg_40s1,'k.-', 'linewidth',1.5, 'markersize',16);
hold on
plot(xavg_40s1,h,'.-r','linewidth',1.5,'markersize',12);

使用卷积方法的代码：

功能：

function d=fit_step(param,h,x,y)
A=param(1);
mu=param(2);
sigma=param(3);
d=conv(h,A*exp(-((x-mu)/sigma).^2),'same')-y;
end

主要代码：

param1=[0.2247    8.1884    0.0802];
step_coff=lsqnonlin(@fit_step,param1,[],[],[],h,dx_40s1,havg_40s1);
% Plot the original and experimental data.
step_new = conv(h,step_coff(1)*exp(-((dx_40s1-step_coff(2))/step_coff(3)).^2),'same');

figure;
plot(xavg_40s1,havg_40s1,'.-r',xavg_40s1,step_new,'.-b');

这很接近，但台阶的边缘已经移动，而且拐角看起来比测量的台阶更锐利。

有人可以帮我找出适合阶梯函数的最佳方法或任何改进代码的建议吗？？

X 数据：

12.6400 12.6720 12.7040 12.7360 12.7680 12.8000 12.8320 12.8640 12.8960 12.9280 12.9600 12.9920 13.0240 13.0560 13.0880 13.1200 13.1520 13.1840 13.2160 13.2480 13.2800 13.3120 13.3440 13.3760 13.4080 13.4400 13.4720 13.5040 13.5360 13.5680 13.6000 13.6320 13.6640 13.6960 13.7280 13.7600 13.7920 13.8240 13.8560 13.8880 13.9200 13.9520 13.9840 14.0160 14.0480 14.0800 14.1120 14.1440 14.1760 14.2080 14.2400 14.272 14.3040 14.3360 14.3680 14.4000 14.4320 14.4640 14.4960 14.5280 14.5600 14.5920 14.6240 14.6560 14.6880 14.7200 14.7520 14.7840 14.8160 14.8480 14.8800 14.9120 14.9440 14.9760 15.0080 15.0400 15.0720 15.1040 15.1360 15.1680 15.2000 15.2320 15.2640 15.2960 15.3280 15.3600 15.3920 15.4240 15.4560 15.4880 15.5200 15.5520 15.5840 15.6160 15.6480 15.6800 15.7120 15.7440 15.7760 15.8080 15.8400 15.8720 15.9040 15.9360 15.9680 16.0000 16.0320 16.0640 16.0960 16.1280 16.1600 16.1920 16.2240 16.2560 16.2880 16.3200 16.3520 16.3840 16.4160 16.4480 16.4800 16.5120 16.5440 16.5760 16.6080 16.6400 16.6720 16.7040 16.7360 16.7680 16.8000 16.8320 16.8640 16.8960 16.9280 16.9600 16.9920 17.0240 17.0560 17.0880 17.1200 17.1520 17.1840 17.2160 17.2480 17.2800 17.3120 17.3440 17.3760 17.4080 17.4400 17.4720 17.5040 17.5360 17.5680 17.6000 17.6320 17.6640 17.6960 17.7280 17.7600

Y 数据：

-0.0404 -0.0405 -0.0350 -0.0406 -0.0412 -0.0407 -0.0378 -0.0405 -0.0337 -0.0417 -0.0413 -0.0387 -0.0352 -0.0373 -0.0369 -0.0388 -0.0384 -0.0351 -0.0401 -0.0314 -0.0375 -0.0390 -0.0330 - 0.0343 -0.0341 -0.0369 -0.0424 -0.0369 -0.0309 -0.0387 -0.0346 -0.0433 -0.0410 -0.0355 -0.0343 -0.0396 -0.0369 -0.0400 -0.0377 -0.0330 -0.0416 -0.0348 -0.0380 -0.0338 -0.0349 -0.0359 -0.0418 -0.0336 - 0.0375 -0.0309 -0.0362 -0.0422 -0.0437 -0.0352 -0.0303 -0.0335 -0.0358 -0.0467 -0.0341 -0.0306 -0.0322 -0.0338 -0.0418 -0.0417 -0.0299 -0.0264 -0.0308 -0.0352 -0.0330 -0.0261 -0.0088 -0.0071 0.0013 0.0012 0.0151 0.0352 0.0475 0.0764 0.1423 0.2617 0.4057 0.6241 0.8076 0.8872 0.9248 0.9340 0.9395 0.9514 0.9650 0.9708 0.9875 0.9852 0.9955 0.9971 0.9966 0.9981 0.9983 0.9932 1.0013 1.0011 0.9961 1.0044 0.9994 1.0028 1.0028 0.9996 1.0009 1.0024 1.0027 1.0075 1.0017 1.0001 1.0033 1.0062 1.0071 1.0032 1.0026 1.0027 1.0062 1.0063 0.9981 1.0025 0.9994 1.0075 1.0026 1.0035 1.0018 0.9999 1.0045 1.0067 0.9980 1.0044 0.9976 0.9976 1.0087 1.0026 1.0010 0.9997 1.0025 0.9943 1.0098 0.9964 0.9994 0.9973 0.9997 1.0084 1.0035 0.9974 0.9967 0.9967 1.0013 1.0060 1.0026 0.9960 0.9970 0.9987 1.0054 1.0048 0.9952 0.9937 0.9972

附上测量步骤和拟合曲线的图像。

【问题讨论】：

当然！我已对我的问题进行了修改。
你到底想知道什么？ ...跳跃的位置或水平，或两者，甚至跳跃前的曲率或全部？

标签： matlab curve-fitting non-linear-regression

【解决方案1】：

为什么不采取简单的方法呢？平滑您的数据，计算它的导数，然后找到该导数的最大值。您可以通过与高斯的导数进行卷积来完成前两个步骤，这很容易生成。

最大值的位置是您尝试拟合的阶跃函数的位移。左侧值的平均值和右侧值的平均值是阶跃函数的低值和高值。

根据第一原则（工具箱将使所有这些步骤变得更简单），高斯梯度的计算方式如下：

x = [12.6400 12.6720 12.7040 12.7360 12.7680 12.8000 12.8320 12.8640 12.8960 12.9280 12.9600 12.9920 13.0240 13.0560 13.0880 13.1200 13.1520 13.1840 13.2160 13.2480 13.2800 13.3120 13.3440 13.3760 13.4080 13.4400 13.4720 13.5040 13.5360 13.5680 13.6000 13.6320 13.6640 13.6960 13.7280 13.7600 13.7920 13.8240 13.8560 13.8880 13.9200 13.9520 13.9840 14.0160 14.0480 14.0800 14.1120 14.1440 14.1760 14.2080 14.2400 14.272 14.3040 14.3360 14.3680 14.4000 14.4320 14.4640 14.4960 14.5280 14.5600 14.5920 14.6240 14.6560 14.6880 14.7200 14.7520 14.7840 14.8160 14.8480 14.8800 14.9120 14.9440 14.9760 15.0080 15.0400 15.0720 15.1040 15.1360 15.1680 15.2000 15.2320 15.2640 15.2960 15.3280 15.3600 15.3920 15.4240 15.4560 15.4880 15.5200 15.5520 15.5840 15.6160 15.6480 15.6800 15.7120 15.7440 15.7760 15.8080 15.8400 15.8720 15.9040 15.9360 15.9680 16.0000 16.0320 16.0640 16.0960 16.1280 16.1600 16.1920 16.2240 16.2560 16.2880 16.3200 16.3520 16.3840 16.4160 16.4480 16.4800 16.5120 16.5440 16.5760 16.6080 16.6400 16.6720 16.7040 16.7360 16.7680 16.8000 16.8320 16.8640 16.8960 16.9280 16.9600 16.9920 17.0240 17.0560 17.0880 17.1200 17.1520 17.1840 17.2160 17.2480 17.2800 17.3120 17.3440 17.3760 17.4080 17.4400 17.4720 17.5040 17.5360 17.5680 17.6000 17.6320 17.6640 17.6960 17.7280 17.7600];
y = [-0.0404 -0.0405 -0.0350 -0.0406 -0.0412 -0.0407 -0.0378 -0.0405 -0.0337 -0.0417 -0.0413 -0.0387 -0.0352 -0.0373 -0.0369 -0.0388 -0.0384 -0.0351 -0.0401 -0.0314 -0.0375 -0.0390 -0.0330 -0.0343 -0.0341 -0.0369 -0.0424 -0.0369 -0.0309 -0.0387 -0.0346 -0.0433 -0.0410 -0.0355 -0.0343 -0.0396 -0.0369 -0.0400 -0.0377 -0.0330 -0.0416 -0.0348 -0.0380 -0.0338 -0.0349 -0.0359 -0.0418 -0.0336 -0.0375 -0.0309 -0.0362 -0.0422 -0.0437 -0.0352 -0.0303 -0.0335 -0.0358 -0.0467 -0.0341 -0.0306 -0.0322 -0.0338 -0.0418 -0.0417 -0.0299 -0.0264 -0.0308 -0.0352 -0.0330 -0.0261 -0.0088 -0.0071 0.0013 0.0012 0.0151 0.0352 0.0475 0.0764 0.1423 0.2617 0.4057 0.6241 0.8076 0.8872 0.9248 0.9340 0.9395 0.9514 0.9650 0.9708 0.9875 0.9852 0.9955 0.9971 0.9966 0.9981 0.9983 0.9932 1.0013 1.0011 0.9961 1.0044 0.9994 1.0028 1.0028 0.9996 1.0009 1.0024 1.0027 1.0075 1.0017 1.0001 1.0033 1.0062 1.0071 1.0032 1.0026 1.0027 1.0062 1.0063 0.9981 1.0025 0.9994 1.0075 1.0026 1.0035 1.0018 0.9999 1.0045 1.0067 0.9980 1.0044 0.9976 0.9976 1.0087 1.0026 1.0010 0.9997 1.0025 0.9943 1.0098 0.9964 0.9994 0.9973 0.9997 1.0084 1.0035 0.9974 0.9967 0.9967 1.0013 1.0060 1.0026 0.9960 0.9970 0.9987 1.0054 1.0048 0.9952 0.9937 0.9972];

sigma = 3;
cutoff = ceil(4*sigma);
kernel = -cutoff:cutoff;
kernel = -kernel .* exp(-0.5 * kernel.^2 / sigma.^2);
grad = conv(y,kernel,'same');

我们可以用max找到最大样本：

[~,ii] = max(grad);

这是最接近过渡点中间的样本。我们可以通过将抛物线拟合到峰值周围的 3 个样本来细化这个位置：

px = x(ii-1:ii+1).';
py = grad(ii-1:ii+1).';
% solve the equation: py = [px.*px, px, ones(3,1)] * params;
params = [px.*px, px, ones(3,1)] \ py;
x_max = -params(2)/(2*params(1));

最后，我们可能希望将过渡前后的 y 值包含在拟合中：

left = median(y(x<x_max));
right = median(y(x>x_max));

（尽管我们可能想假设left=0 和right=1。）

绘图：

plot(x,y)
hold on
plot([x(1),x_max,x_max,x(end)],[left,left,right,right])

为了拟合一个完整的误差函数（这是高斯的积分），我们只需要多一步：上面我们将抛物线拟合到最大值附近的三个样本，现在我们将抛物线拟合到 y 的对数值（有关说明，请参阅this other Q&A），并选择所有高于此拟合峰值 0.2 倍的 y 值以避免拟合噪声。这大约是 2 sigma，应该足以获得对高斯峰值的准确估计。高斯峰的参数也是平滑误差函数的参数，我们可以为这个额外的平滑校正估计的 sigma：

% using grad from the code above (as well as x and y)
[m,ii] = max(grad);
w = sum(grad > m * 0.2) / 2;
px = x(ii-w:ii+w).';
py = log(grad(ii-w:ii+w)).';
% solve the equation: py = [px.*px, px, ones(3,1)] * params;
params = [px.*px, px, ones(size(px))] \ py;
% obtain Gaussian parameters 
fitted_mu = -params(2)/(2*params(1));
fitted_sigma = sqrt(-0.5/params(1));
% correct for smoothing applied
fitted_sigma = sqrt(fitted_sigma^2 - (sigma*mean(diff(x)))^2);
% evaluated fitted function
fitted_y = (erf((x-fitted_mu)/fitted_sigma) + 1) / 2 * (right-left) + left;

clf
plot(x,y)
hold on
plot(x,fitted_y)

【讨论】：

感谢您的回复！这看起来更像是生成一个完美的步骤，但我正在寻找的是适合我的数据的曲线。你是想说，如果我找到这个完美步骤的卷积，我会得到最适合我的步骤吗？我尝试过圆角看起来很接近，但边缘又与应有的不同。
Hs=[左*(ones(1,65)) 右*(ones(1,64))]; xs1=min(x):0.16/4:x_max; xs2=x_max:0.16/4:max(x); xs=[xs1 x_max xs2];数字; plot(x,y,'.-') 坚持情节(xs,Hs,'.-') grad=grad(7:151);毕业=毕业/总和（毕业）； hfit=conv(Hs,grad,'same');数字;情节（xs，hfit，'.-k'）；坚持情节（x，y，'.-b'）；

【解决方案2】：

这是一个示例图形拟合器，它使用您的数据和我的方程搜索中的 sigmoid。此示例使用标准 scipy 差分进化遗传算法模块来确定曲线拟合的初始参数估计，并且该模块使用拉丁超立方算法来确保彻底搜索需要搜索范围的参数空间。在此示例中，搜索范围源自数据。请注意，估计初始参数估计值的范围要比估计具体值容易得多。

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings

xData = numpy.array([12.6400, 12.6720, 12.7040, 12.7360, 12.7680, 12.8000, 12.8320, 12.8640, 12.8960, 12.9280, 12.9600, 12.9920, 13.0240, 13.0560, 13.0880, 13.1200, 13.1520, 13.1840, 13.2160, 13.2480, 13.2800, 13.3120, 13.3440, 13.3760, 13.4080, 13.4400, 13.4720, 13.5040, 13.5360, 13.5680, 13.6000, 13.6320, 13.6640, 13.6960, 13.7280, 13.7600, 13.7920, 13.8240, 13.8560, 13.8880, 13.9200, 13.9520, 13.9840, 14.0160, 14.0480, 14.0800, 14.1120, 14.1440, 14.1760, 14.2080, 14.2400, 14.272, 14.3040, 14.3360, 14.3680, 14.4000, 14.4320, 14.4640, 14.4960, 14.5280, 14.5600, 14.5920, 14.6240, 14.6560, 14.6880, 14.7200, 14.7520, 14.7840, 14.8160, 14.8480, 14.8800, 14.9120, 14.9440, 14.9760, 15.0080, 15.0400, 15.0720, 15.1040, 15.1360, 15.1680, 15.2000, 15.2320, 15.2640, 15.2960, 15.3280, 15.3600, 15.3920, 15.4240, 15.4560, 15.4880, 15.5200, 15.5520, 15.5840, 15.6160, 15.6480, 15.6800, 15.7120, 15.7440, 15.7760, 15.8080, 15.8400, 15.8720, 15.9040, 15.9360, 15.9680, 16.0000, 16.0320, 16.0640, 16.0960, 16.1280, 16.1600, 16.1920, 16.2240, 16.2560, 16.2880, 16.3200, 16.3520, 16.3840, 16.4160, 16.4480, 16.4800, 16.5120, 16.5440, 16.5760, 16.6080, 16.6400, 16.6720, 16.7040, 16.7360, 16.7680, 16.8000, 16.8320, 16.8640, 16.8960, 16.9280, 16.9600, 16.9920, 17.0240, 17.0560, 17.0880, 17.1200, 17.1520, 17.1840, 17.2160, 17.2480, 17.2800, 17.3120, 17.3440, 17.3760, 17.4080, 17.4400, 17.4720, 17.5040, 17.5360, 17.5680, 17.6000, 17.6320, 17.6640, 17.6960, 17.7280, 17.7600])

yData = numpy.array([-0.0404, -0.0405, -0.0350, -0.0406, -0.0412, -0.0407, -0.0378, -0.0405, -0.0337, -0.0417, -0.0413, -0.0387, -0.0352, -0.0373, -0.0369, -0.0388, -0.0384, -0.0351, -0.0401, -0.0314, -0.0375, -0.0390, -0.0330, -0.0343, -0.0341, -0.0369, -0.0424, -0.0369, -0.0309, -0.0387, -0.0346, -0.0433, -0.0410, -0.0355, -0.0343, -0.0396, -0.0369, -0.0400, -0.0377, -0.0330, -0.0416, -0.0348, -0.0380, -0.0338, -0.0349, -0.0359, -0.0418, -0.0336, -0.0375, -0.0309, -0.0362, -0.0422, -0.0437, -0.0352, -0.0303, -0.0335, -0.0358, -0.0467, -0.0341, -0.0306, -0.0322, -0.0338, -0.0418, -0.0417, -0.0299, -0.0264, -0.0308, -0.0352, -0.0330, -0.0261, -0.0088, -0.0071, 0.0013, 0.0012, 0.0151, 0.0352, 0.0475, 0.0764, 0.1423, 0.2617, 0.4057, 0.6241, 0.8076, 0.8872, 0.9248, 0.9340, 0.9395, 0.9514, 0.9650, 0.9708, 0.9875, 0.9852, 0.9955, 0.9971, 0.9966, 0.9981, 0.9983, 0.9932, 1.0013, 1.0011, 0.9961, 1.0044, 0.9994, 1.0028, 1.0028, 0.9996, 1.0009, 1.0024, 1.0027, 1.0075, 1.0017, 1.0001, 1.0033, 1.0062, 1.0071, 1.0032, 1.0026, 1.0027, 1.0062, 1.0063, 0.9981, 1.0025, 0.9994, 1.0075, 1.0026, 1.0035, 1.0018, 0.9999, 1.0045, 1.0067, 0.9980, 1.0044, 0.9976, 0.9976, 1.0087, 1.0026, 1.0010, 0.9997, 1.0025, 0.9943, 1.0098, 0.9964, 0.9994, 0.9973, 0.9997, 1.0084, 1.0035, 0.9974, 0.9967, 0.9967, 1.0013, 1.0060, 1.0026, 0.9960, 0.9970, 0.9987, 1.0054, 1.0048, 0.9952, 0.9937, 0.9972])

def func(x, a, b, Offset): # Sigmoid A with Offset from zunzun.com
    return 1.0 / (1.0 + numpy.exp(-1.0 * a * (x - b))) + Offset


# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
    val = func(xData, *parameterTuple)
    return numpy.sum((yData - val) ** 2.0)


def generate_Initial_Parameters():
    # min and max used for bounds
    maxX = max(xData)
    minX = min(xData)
    maxY = max(yData)
    minY = min(yData)

    parameterBounds = []
    parameterBounds.append([minX, maxX]) # search bounds for a
    parameterBounds.append([minX, maxX]) # search bounds for b
    parameterBounds.append([minY, maxY]) # search bounds for Offset

    # "seed" the numpy random number generator for repeatable results
    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
    return result.x

# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()

# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

【讨论】：

飞利浦感谢您的回复！你能告诉我你用的是什么软件吗？
我也可以在 MATLAB 中使用这个带有偏移的 Sigmoid 方程来拟合。您能否解释一下，如何将绑定添加到 a 和 b？我可以看到您在 x 的最小值和最大值内为“a”和“b”设置了一些界限，并在 y 的最小值和最大值内设置了偏移量。与您的图像相比，我的拟合曲线的 y 值有点偏离。
@SwatiJain 对于方程搜索，我使用 zunzun.com 开源 Python 曲线和曲面拟合网站上的“函数查找器”来查找带偏移的 Sigmoid。至于我的示例代码中的遗传算法搜索范围，范围可能很大，所以我总是首先尝试从数据中得出的范围 - 并且再次运行良好。
我对 Python 一点也不熟悉。如果你能告诉我 bound 背后的数学原理，我可以将它应用到我的 MATLAB 代码上。我的意思是这些条件是这样工作的：min(x)
这些是遗传算法的搜索范围。如果你想尝试拟合这个方程的直接链接是zunzun.com/Equation/2/Sigmoidal/Sigmoid%20A