python中的单变量回归答案

【问题标题】：univariate regression in pythonpython中的单变量回归
【发布时间】：2019-07-08 07:25:05
【问题描述】：

需要在 python 中在数据框中的一列和同一数据框中的其他几列之间运行多个单因素（单变量）回归模型

所以基于图像，我想在 x1 & dep、x2 & dep 等之间运行回归模型，以此类推

想要输出 - beta、截距、R-sq、p-value、SSE、AIC、BIC、残差正态性检验等

【问题讨论】：

请展示您尝试过的方法以及遇到问题的地方。
我在 SAS Base 9.4 上完成了这个练习。但是，我正在尝试在 python 上执行此操作。您可以根据我在帖子中上传的图像查看我的数据帧的外观。告诉我你的解决方案？

标签： python-3.x pandas jupyter-lab

【解决方案1】：

您可以在此处使用两个选项。一个是流行的scikit-learn 库。使用如下

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X, y)  # where X is your feature data and y is your target
reg.score(X, y)  # R^2 value
>>> 0.87
reg.coef_  # slope coeficients
>>> array([1.45, -9.2])
reg.intercept_  # intercept
>>> 6.1723...

您可以在 scikit 中使用的其他统计信息并不多。

另一个选项是statsmodels，它为模型的统计数据提供了更丰富的细节

import numpy as np
import statsmodels.api as sm

# generate some synthetic data
nsample = 100
x = np.linspace(0, 10, 100)
X = np.column_stack((x, x**2))
beta = np.array([1, 0.1, 10])
e = np.random.normal(size=nsample)

X = sm.add_constant(X)
y = np.dot(X, beta) + e

# fit the model and get a summary of the statistics
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 4.020e+06
Date:                Mon, 08 Jul 2019   Prob (F-statistic):          2.83e-239
Time:                        02:07:22   Log-Likelihood:                -146.51
No. Observations:                 100   AIC:                             299.0
Df Residuals:                      97   BIC:                             306.8
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.3423      0.313      4.292      0.000       0.722       1.963
x1            -0.0402      0.145     -0.278      0.781      -0.327       0.247
x2            10.0103      0.014    715.745      0.000       9.982      10.038
==============================================================================
Omnibus:                        2.042   Durbin-Watson:                   2.274
Prob(Omnibus):                  0.360   Jarque-Bera (JB):                1.875
Skew:                           0.234   Prob(JB):                        0.392
Kurtosis:                       2.519   Cond. No.                         144.
==============================================================================

您可以看到 statsmodels 提供了更多详细信息，例如 AIC、BIC、t 统计量等。

【讨论】：

我的数据框基本上看起来就像我在帖子中嵌入的图像。我的最终输出应该包含我在单独列中使用 beta 运行的模型列表，在单独列中使用 r-sq 等。
是的，我的回答提供了如何做到这一点。数据预处理/结构化由您自行决定。如果要将数据框中的特定列放入 numpy 数组中以适应模型，可以使用 X = df[['x1','x2']].values 和 y = df[['Dependent']]