Python线性回归组合问题

【问题标题】：Python Linear Regression Combination ProblemPython线性回归组合问题
【发布时间】：2021-11-20 07:01:57
【问题描述】：

我需要在我的数据框的两个变量组中计算线性回归和 MSE。问题是我无法将 xtrain 与两个变量与 ytrain 与一个变量进行比较，但我的 ytrain 中只有一列。

代码：

from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=4, n_informative=3, n_targets=1, noise=0.01)

问题：

from itertools import combinations
for c in combinations(range(4), 2):
    lr=LinearRegression()
    lr.fit(Xtrain[:,c].reshape(-1,1),ytrain)
    yp=lr.predict(Xtest[:,c].reshape(-1,1))
    print('MSE', np.sum((ytest - yp)**2) / len(ytest))

错误：

【问题讨论】：

标签： python scikit-learn linear-regression

【解决方案1】：

没有必要对特征矩阵使用 reshape 方法，因为它们已经是二维的。如果您删除重塑，您的代码将起作用，请参见下文。

from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from itertools import combinations
import numpy as np

X, y = make_regression(n_samples=100, n_features=4, n_informative=3, n_targets=1, noise=0.01, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

for c in combinations(range(4), 2):

    lr = LinearRegression()
    lr.fit(X_train[:, c], y_train)
    yp = lr.predict(X_test[:, c])

    print('MSE', np.sum((y_test - yp) ** 2) / len(y_test))

# MSE 591.707619290734
# MSE 33.613143724590564
# MSE 634.3248475857874
# MSE 1646.9447686107499
# MSE 2293.2878076807942
# MSE 1700.2559702871085

【讨论】：