如何使用 3D 中的固定点进行多项式拟合答案

【问题标题】：How to do a polynomial fit with fixed points in 3D如何使用 3D 中的固定点进行多项式拟合
【发布时间】：2019-01-19 16:24:39
【问题描述】：

我在 3D 空间中有一组 x、y、z 点和另一个名为 charge 的变量，它表示在特定 x、y、z 坐标中沉积的电荷量。我想对这些数据进行加权（由沉积在检测器中的电荷量加权，这仅对应于更高的权重以获得更多的电荷），使其通过给定的点，即顶点。

现在，当我为 2D 执行此操作时，我尝试了各种方法（将顶点带到原点并对所有其他点进行相同的变换并强制拟合通过原点，从而使顶点非常高重量），但没有一个比 Jaime 在这里给出的答案好：How to do a polynomial fit with fixed points

它使用拉格朗日乘数的方法，我在本科高级多变量课程中隐约熟悉，但除此之外不多，而且该代码的转换似乎不像添加 az 那样简单协调。（请注意，即使代码没有考虑存入的费用金额，它仍然给了我最好的结果）。我想知道是否有相同算法的版本，但在 3D 中。我也在 Gmail 中联系了答案的作者，但没有收到他的回复。

以下是有关我的数据以及我在 2D 中尝试做什么的更多信息：How to weigh the points in a scatter plot for a fit?

这是我执行此操作的代码，我强制顶点位于原点，然后适合数据设置 fit_intercept=False。我目前正在为 2D 数据采用这种方法，因为我不确定是否有用于拉格朗日乘数的 3D 版本，但是在 3D 中存在线性回归方法，例如，这里：Fitting a line in 3D:

import numpy as np
import sklearn.linear_model

def plot_best_fit(image_array, vertexX, vertexY):
    weights = np.array(image_array)
    x = np.where(weights>0)[1]
    y = np.where(weights>0)[0]
    size = len(image_array) * len(image_array[0])
    y = np.zeros((len(image_array), len(image_array[0])))
    for i in range(len(np.where(weights>0)[0])):
        y[np.where(weights>0)[0][i]][np.where(weights>0)[1][i]] = np.where(weights>0)[0][i]
    y = y.reshape(size)
    x = np.array(range(len(image_array)) * len(image_array[0]))
    weights = weights.reshape((size))
    for i in range(len(x)):
        x[i] -= vertexX
        y[i] -= vertexY
    model = sklearn.linear_model.LinearRegression(fit_intercept=False)
    model.fit(x.reshape((-1, 1)),y,sample_weight=weights)
    line_x = np.linspace(0, 512, 100).reshape((-1,1))
    pred = model.predict(line_x)
    m, b = np.polyfit(np.linspace(0, 512, 100), np.array(pred), 1)
    angle = math.atan(m) * 180/math.pi
    return line_x, pred, angle, b, m

image_array 是一个 numpy 数组，vertexX 和 vertexY 分别是顶点的 x 和 y 坐标。这是我的数据：https://uploadfiles.io/bbhxo。我无法创建玩具数据，因为没有一种简单的方法可以复制这些数据，它是由 Geant4 模拟中微子与氩核相互作用产生的。我不想摆脱数据的复杂性。而这个特定的事件恰好是我的代码不起作用的事件，我不确定我是否可以专门生成数据，所以我的代码不起作用。

【问题讨论】：

您是否想以更多费用为积分提供更多权重？或者您是否试图通过几个关键点强制拟合线？这是两个不同的问题。第一个问题在您之前的问题中已经有了答案。拉格朗日乘数将有助于解决第二个问题（即拟合受约束的曲线）。
理想情况下两者兼而有之（在我的情况下，只有一点我必须适应它）。但正如我在问题中所说，如果我可以在没有权重的情况下获得 3D 的拉格朗日乘数（因为它对 2D 效果最好），那就足够了。
所以您有 i) 必须在最佳拟合线上的单个点，并且 ii) 想要对所有其他点应用权重？我会将您的数据重新集中在您的约束点周围，然后使用具有适当权重的 Scikit-learn 拟合多项式回归，设置 fit_intercept=False。
当您说围绕约束点重新定位我的数据时，您是指将约束点带到原点并将所有其他点更改相同的量吗？
另外，不，这对我的数据不起作用

标签： python curve-fitting data-fitting

【解决方案1】：

这更像是一个使用基本优化的手工解决方案。这是直截了当的。只需测量一个点到要拟合的线的距离，并使用基本的optimize.leastsq 最小化加权距离。代码如下：

# -*- coding: utf-8 -*
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.cm as cm
from scipy import optimize
import numpy as np

def rnd( a ):
    return  a * ( 1 - 2 * np.random.random() ) 

def affine_line( s, theta, phi, x0, y0, z0 ):
    a = np.sin( theta) * np.cos( phi )
    b = np.sin( theta) * np.sin( phi )
    c = np.cos( theta )
    return np.array( [ s * a + x0, s * b + y0, s * c + z0 ] )

def point_to_line_distance( x , y, z , theta, phi, x0, y0, z0 ):
    xx = x - x0
    yy = y - y0
    zz = z - z0
    a = np.sin( theta) * np.cos( phi )
    b = np.sin( theta) * np.sin( phi )
    c = np.cos( theta )
    r = np.array( [ xx, yy, zz ] )
    t = np.array( [ a, b, c ] )
    return np.linalg.norm( r - np.dot( r, t) * t )

def residuals( parameters, fixpoint, data, weights=None ):
    theta, phi = parameters
    x0, y0, z0 = fixpoint
    if weights is None:
        w = np.ones( len( data ) )
    else:
        w = np.array( weights )
    diff = np.array( [ point_to_line_distance( x , y, z , theta, phi , *fixpoint ) for x, y, z in data ] )
    diff = diff * w
    return diff

### some test data
fixpoint = [ 1, 2 , -.3 ]
trueline = np.array( [ affine_line( s, .7, 1.7, *fixpoint ) for s in np.linspace( -1, 2, 50 ) ] )
rndData = np.array( [ np.array( [ a + rnd( .6), b + rnd( .35 ), c + rnd( .45 ) ] ) for a, b, c in trueline ] )
zData = [ 20 * point_to_line_distance( x , y, z , .7, 1.7, *fixpoint ) for x, y, z in rndData ]

### unweighted
bestFitValuesUW, ier= optimize.leastsq(residuals, [ 0, 0],args=( fixpoint, rndData ) )
print bestFitValuesUW
uwLine = np.array( [ affine_line( s, bestFitValuesUW[0], bestFitValuesUW[1], *fixpoint ) for s in np.linspace( -2, 2, 50 ) ] )

### weighted ( chose inverse distance as weight....would be charge in OP's case )
bestFitValuesW, ier= optimize.leastsq(residuals, [ 0, 0],args=( fixpoint, rndData, [ 1./s for s in zData ] ) )
print bestFitValuesW
wLine = np.array( [ affine_line( s, bestFitValuesW[0], bestFitValuesW[1], *fixpoint ) for s in np.linspace( -2, 2, 50 ) ] )

### plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1, projection='3d' )
ax.plot( *np.transpose(trueline ) ) 
ax.scatter( *fixpoint, color='k' )
ax.scatter( rndData[::,0], rndData[::,1], rndData[::,2] , c=zData, cmap=cm.jet )

ax.plot( *np.transpose( uwLine ) ) 
ax.plot( *np.transpose( wLine ) ) 

ax.set_xlim( [ 0, 2.5 ] )
ax.set_ylim( [ 1, 3.5 ] )
ax.set_zlim( [ -1.25, 1.25 ] )

plt.show()

>> [-0.68236386 -1.3057938 ]
>> [-0.70928735 -1.4617517 ]

固定点显示为黑色。蓝色的原始线。未加权和加权拟合分别为橙色和绿色。数据根据到线的距离着色。

【讨论】：

快速问题：准确返回的两个值是什么？ [-0.68236386 -1.3057938 ]，它们是 theta 和 phi 吗？
还有你说的“原线”是什么意思？
@AlwaysLearningForever 是的，返回的参数是theta 和phi 的最佳拟合值。由于这是通用数据，我通过提供给定行添加随机数来生成它。这是“原始线”，因此拟合应该接近这条线。
@AlwaysLearningForever 在这种情况下，我/我们有优势知道合身应该是什么样子。玩它可能会有所帮助。例如，可以看到，根据随机点的分布方式，权重或多或少会产生影响。