R：使用 plm 和 pglm 绘制面板模型预测答案

【问题标题】：R: Plotting panel model predictions using plm & pglmR：使用 plm 和 pglm 绘制面板模型预测
【发布时间】：2015-11-04 09:40:37
【问题描述】：

我创建了两个回归模型，使用带有 plm 的线性面板模型，以及使用带有 pglm 包的泊松的广义面板模型。

library(plm); library(pglm)
data(Unions)  # from pglm-package
punions <- pdata.frame(Unions, c("id", "year"))

fit1 <- plm(wage ~ exper + rural + married, data=punions, model="random")
fit2 <- pglm(wage ~ exper + rural + married, data=punions, model="random", family="poisson")

我现在想通过在一组散点图中绘制拟合值来以图形方式比较这两种拟合。最好沿着这些思路使用 ggplot2：

library(ggplot2)
ggplot(punions, aes(x=exper, y=wage)) +
    geom_point() +
    facet_wrap(rural ~ married)

我考虑过简单地使用 ggplot2 的 stat_smooth()，但是（也许不足为奇）它似乎无法识别我的数据的面板格式。使用predict 手动提取预测值似乎也不适用于 pglm 模型。

如何在此图中叠加两个面板模型的预测值？

【问题讨论】：

this doc page中间的例子可能对你有用。
你的随机效应是什么？个人？

标签： r panel-data plm

【解决方案1】：

类似于@mtoto，我也不熟悉library(plm) 或library(gplm)。但是plm 的预测方法是可用的，只是没有导出。 pglm 没有 predict 方法。

R> methods(class= "plm")
[1] ercomp          fixef           has.intercept   model.matrix    pFtest          plmtest         plot            pmodel.response
 [9] pooltest        predict         residuals       summary         vcovBK          vcovHC          vcovSCC        
R> methods(class= "pglm")
no methods found

值得注意的是，我不明白您为什么要对工资数据使用泊松模型。它显然不是泊松分布，因为它采用非整数值（如下）。如果您愿意，您可以尝试负二项式，但我不确定它是否适用于随机效应。但你可以使用 MASS::glm.nb 为例。

> quantile(Unions$wage, seq(0,1,.1))
         0%         10%         20%         30%         40%         50%         60%         70%         80%         90%        100% 
 0.02790139  2.87570334  3.54965422  4.14864865  4.71605855  5.31824370  6.01422463  6.87414349  7.88514525  9.59904809 57.50431282

解决方案1：使用`plm`

punions$p <- plm:::predict.plm(fit1, punions)
# From examining the source code, predict.plm does not incorporate 
# the random effects, so you do not get appropriate predictions. 
# You just get the FE predictions.

ggplot(punions, aes(x=exper, y=p)) +
  geom_point() +
  facet_wrap(rural ~ married)

解决方案 2 - `lme4`

或者，您可以从lme4 包中获得类似的拟合，它确实定义了一个预测方法：

library(lme4)
Unions$id <- factor(Unions$id)
fit3 <- lmer(wage ~ exper + rural + married + (1|id), data= Unions)
# not run:
fit4 <- glmer(wage ~ exper + rural + married + (1|id), data= Unions, family= poisson(link= "log"))

R> fit1$coefficients
(Intercept)       exper    ruralyes  marriedyes 
  3.7467469   0.3088949  -0.2442846   0.4781113 
R>  fixef(fit3)
(Intercept)       exper    ruralyes  marriedyes 
  3.7150302   0.3134898  -0.1950361   0.4592975

我没有运行泊松模型，因为它显然被错误地指定了。您可以进行某种变量转换来处理它，或者可能是负二项式。无论如何，让我们完成这个例子：

# this has RE for individuals, so you do see dispersion based on the RE
Unions$p <- predict(fit3, Unions)
ggplot(Unions, aes(x=exper, y=p)) +
    geom_point() +
    facet_wrap(rural ~ married)

【讨论】：

【解决方案2】：

我不熟悉 pglm 包，但似乎没有类似于 predict() 的函数可以从数据框中生成未来值的向量。

在所有其他情况下（应该都是 tbh），您可以轻松地在 ggplot 中绘制它，即使使用 facet wrap。您只需将预测作为新列添加到数据框中：

punions$pred1 <- predict(fit1,punions,class="lm")

然后将其添加为额外的geom_line()：

    ggplot() + geom_point(data=punions, aes(x=exper, y=wage)) +
    geom_line(data=punions,aes(x=exper, y= pred1), color = "red") +
    facet_wrap(rural ~ married)

【讨论】：

解决方案1：使用plm

解决方案 2 - lme4

解决方案1：使用`plm`

解决方案 2 - `lme4`