【发布时间】:2021-10-06 02:12:52
【问题描述】:
似乎SKLearn LinearRegression 的预测结果取决于X_train(和X_test)的列顺序,尽管在我的理解中OLS 线性回归解决方案应该独立于它:
import pandas as pd
from sklearn.linear_model import LinearRegression
X_train = pd.DataFrame({
'x2': [0.41881871483604843, 0.41881871483604843, 0.41881871483604843, -2.2128066838437888, 0.41881871483604843],
'x1': [0.3226465587013849, 0.3226465587013849, 0.3226465587013849, -2.1432281979935226, 0.3226465587013849],
'x3': [0.41881871483604843, 0.41881871483604843, 0.41881871483604843, -2.2128066838437888, 0.41881871483604843]
})
y_train = pd.Series([0.00208714705719199, 0.0, 0.0373802794439473, 0.4751917903756102, 0.01156975729482886])
X_test = pd.DataFrame({
'x2': [0.6718361093920282, 0.39636690075505104, 0.4225844259460428, 0.4225844259460428, 0.6991034460436102],
'x1': [1.417088758155678, 0.25726707774120766, 0.25726707774120766, 0.25726707774120766, 1.417088758155678],
'x3': [0.6718361093920282, 0.39636690075505104, 0.4225844259460428, 0.4225844259460428,0.6991034460436102]
})
y_test = pd.Series([0.21970766666406633, 0.1452871258871291, 0.08888275135771367, 0.08914350635018843, 0.04924794822392303])
model = LinearRegression().fit(X_train, y_train)
yhat_train = model.predict(X_train)
yhat_test = model.predict(X_test)
# Sort columns.
cols = sorted(X_train.columns)
sorted_X_train = X_train[cols].copy()
sorted_X_test = X_test[cols].copy()
sorted_model = LinearRegression()
sorted_model = sorted_model.fit(sorted_X_train, y_train)
sorted_yhat_train = sorted_model.predict(sorted_X_train)
sorted_yhat_test = sorted_model.predict(sorted_X_test)
print(f'yhat_test : {yhat_test}')
print(f'sorted_yhat_test: {sorted_yhat_test}')
结果:
yhat_test : [-8.13124851e+12 4.20539351e+11 6.53526629e+11 6.53526629e+11
-7.88893187e+12]
sorted_yhat_test: [-0.08075183 0.0192414 0.01603989 0.01603989 -0.08408154]
系数也不同(值也不同,而不仅仅是顺序)。我在这里做错了什么?
【问题讨论】:
标签: python scikit-learn linear-regression