【发布时间】:2018-05-21 00:56:06
【问题描述】:
给定一些随机生成的数据与
- 2 列,
- 50 行和
- 0-100之间的整数范围
使用 R,泊松 glm 和诊断图可以这样实现:
> col=2
> row=50
> range=0:100
> df <- data.frame(replicate(col,sample(range,row,rep=TRUE)))
> model <- glm(X2 ~ X1, data = df, family = poisson)
> glm.diag.plots(model)
在 Python 中,这会给我线预测器与残差图:
import numpy as np
import pandas as pd
import statsmodels.formula.api
from statsmodels.genmod.families import Poisson
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randint(100, size=(50,2)))
df.rename(columns={0:'X1', 1:'X2'}, inplace=True)
glm = statsmodels.formula.api.gee
model = glm("X2 ~ X1", groups=None, data=df, family=Poisson())
results = model.fit()
并在 Python 中绘制诊断:
model_fitted_y = results.fittedvalues # fitted values (need a constant term for intercept)
model_residuals = results.resid # model residuals
model_abs_resid = np.abs(model_residuals) # absolute residuals
plot_lm_1 = plt.figure(1)
plot_lm_1.set_figheight(8)
plot_lm_1.set_figwidth(12)
plot_lm_1.axes[0] = sns.residplot(model_fitted_y, 'X2', data=df, lowess=True, scatter_kws={'alpha': 0.5}, line_kws={'color': 'red', 'lw': 1, 'alpha': 0.8})
plot_lm_1.axes[0].set_xlabel('Line Predictor')
plot_lm_1.axes[0].set_ylabel('Residuals')
plt.show()
但是当我尝试获取厨师统计数据时,
# cook's distance, from statsmodels internals
model_cooks = results.get_influence().cooks_distance[0]
它抛出一个错误说:
AttributeError Traceback (most recent call last)
<ipython-input-66-0f2bedfa1741> in <module>()
4 model_residuals = results.resid
5 # normalized residuals
----> 6 model_norm_residuals = results.get_influence().resid_studentized_internal
7 # absolute squared normalized residuals
8 model_norm_residuals_abs_sqrt = np.sqrt(np.abs(model_norm_residuals))
/opt/conda/lib/python3.6/site-packages/statsmodels/base/wrapper.py in __getattribute__(self, attr)
33 pass
34
---> 35 obj = getattr(results, attr)
36 data = results.model.data
37 how = self._wrap_attrs.get(attr)
AttributeError: 'GEEResults' object has no attribute 'get_influence'
有没有办法像在 R 中一样在 Python 中绘制所有 4 个诊断图?
如何在 Python 中使用 statsmodels 检索拟合模型结果的烹调统计数据?
【问题讨论】:
-
离群值和影响测量仅适用于 OLS 和 WLS。 (使用一些 GLM 残差可能并不难,但需要针对 R 或 Stata 进行单元测试。GEE 可能更难。)
-
出于某些目的,R 确实是王者。虽然 Python 的代码比 R 代码最少且更短,但在后一种语言中,只需少量命令即可完成大量工作。我想念 R 的命令 ;)
标签: python plot machine-learning statsmodels