【问题标题】:Python Linear Regression Combination ProblemPython线性回归组合问题
【发布时间】:2021-11-20 07:01:57
【问题描述】:

我需要在我的数据框的两个变量组中计算线性回归和 MSE。问题是我无法将 xtrain 与两个变量与 ytrain 与一个变量进行比较,但我的 ytrain 中只有一列。

代码:

from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=4, n_informative=3, n_targets=1, noise=0.01)

问题:

from itertools import combinations
for c in combinations(range(4), 2):
    lr=LinearRegression()
    lr.fit(Xtrain[:,c].reshape(-1,1),ytrain)
    yp=lr.predict(Xtest[:,c].reshape(-1,1))
    print('MSE', np.sum((ytest - yp)**2) / len(ytest))

错误:

【问题讨论】:

    标签: python scikit-learn linear-regression


    【解决方案1】:

    没有必要对特征矩阵使用 reshape 方法,因为它们已经是二维的。如果您删除重塑,您的代码将起作用,请参见下文。

    from sklearn.datasets import make_regression
    from sklearn.linear_model import LinearRegression
    from sklearn.model_selection import train_test_split
    from itertools import combinations
    import numpy as np
    
    X, y = make_regression(n_samples=100, n_features=4, n_informative=3, n_targets=1, noise=0.01, random_state=42)
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
    
    for c in combinations(range(4), 2):
    
        lr = LinearRegression()
        lr.fit(X_train[:, c], y_train)
        yp = lr.predict(X_test[:, c])
    
        print('MSE', np.sum((y_test - yp) ** 2) / len(y_test))
    
    # MSE 591.707619290734
    # MSE 33.613143724590564
    # MSE 634.3248475857874
    # MSE 1646.9447686107499
    # MSE 2293.2878076807942
    # MSE 1700.2559702871085
    

    【讨论】: