【发布时间】:2020-12-10 02:22:00
【问题描述】:
我有一个回归问题来估计y = a*x+b 的斜率,并尝试了两种不同的方法来计算a。方法一将两个数据簇的均值估计为两个点,据此计算a。方法2使用标准回归方程。
import numpy as np
import statistics
# find the slope a of y = a*x + b
x = "28.693756 28.850006 28.662506 28.693756 28.756256 28.662506 28.787506 \
28.818756 28.818756 28.787506 28.787506 28.787506 28.693756 28.787506 \
28.818756 28.725006 28.725006 28.850006 28.756256 28.725006 28.881256 \
28.818756 28.756256 28.693756 28.756256 28.787506 28.693756 28.662506 \
28.662506 28.787506 28.850006 28.756256 28.725006 28.818756 28.600006 \
28.725006 28.725006 28.850006 28.881256 28.881256 28.818756 28.756256 \
28.756256 28.787506 28.787506 28.787506 28.756256 28.787506 28.725006 \
28.725006 28.725006 28.756256 28.818756 28.756256 28.693756 28.818756 \
28.756256 28.756256 28.693756 28.850006 28.631256 28.693756 28.693756 \
28.850006 28.756256 28.725006 28.693756 28.756256 28.850006 28.787506 \
28.600006 28.631256"
x = [float(t) for t in x.split()]
y = [33.8]*36 + [38.7]*36
print(" ")
print("Method 1 ")
x1, x2 = statistics.mean(x[:36]), statistics.mean(x[36:])
y1, y2 = statistics.mean(y[:36]), statistics.mean(y[36:])
slope = (y1-y2)/(x1-x2)
print(f"a = {slope}")
print(" ")
print('Method 2')
x = np.array(x)
y = np.array(y)
X = np.c_[np.ones(x.shape), x]
XXinv = np.linalg.inv(X.transpose().dot(X)).dot(X.transpose())
_beta = XXinv.dot(y)
iv = np.linalg.inv(X.transpose().dot(X)).tolist()
print(f"a = {_beta[1]}")
xx = X.transpose().dot(X)
svd = np.linalg.svd(xx)[1]
print(f"SVD(XX) = {svd}")
代码的结果是:
Method 1
a = 1128.9599999997959
Method 2
a = 1.2136744782028899
SVD(XX) = [5.96125150e+04 3.80959618e-04]
从数据图中,线应该接近垂直线性,方法 1 的结果比方法 2 更有意义。此外,即使数据斜率最小的线(如图所示)也有 17.5 的斜率。对于正常情况,方法 2 效果很好。但是在这种情况下,它给出了 1.21 的如此小的斜率,这是没有意义的。
我能想到的唯一原因是 SVD 值中显示的近奇点。但为什么?或任何修复?
【问题讨论】:
标签: linear-regression linear-algebra svd coefficients singular