R中的PLM是否有预测功能？答案

【问题标题】：Is there a predict function for PLM in R?R中的PLM是否有预测功能？
【发布时间】：2011-08-19 14:28:17
【问题描述】：

我有一个小 N 大 T 面板，我通过 plm（面板线性回归模型）进行估计，具有固定效应。

有什么方法可以获取新数据集的预测值？（我想要估计我样本子集的参数，然后使用这些参数计算整个样本的模型隐含值）。

谢谢！

【问题讨论】：

它似乎在后台使用lm，所以您是否尝试过调用predict.lm？
我怀疑作者知道发布predict.plm 函数会鼓励不了解统计问题的人在不满足假设时盲目应用它。 IIRC，lme4 包也不提供预测功能，plm 作者指出他们正在估计随机和固定组件。
predict.lm 不起作用。我想有一种方法可以提取系数和截距，但我想其他人已经遇到过这个问题

标签： r plm

【解决方案1】：

包中有（至少）两种方法可以从 plm 对象生成估计值：

-- fixef.plm: 提取固定效果

-- pmodel.response：提取model.response的函数

在我看来，作者对提供“随机效应”的估计不感兴趣。这可能是“如果你自己不知道怎么做，那我们不想给你一把锋利的刀，把自己割得太深。”

【讨论】：

【解决方案2】：

我编写了一个名为predict.out.plm 的函数，它可以为原始数据和操纵数据集（具有相同的列名）创建预测。

predict.out.plm 计算 a) 转换数据的预测（拟合）结果和 b) 构建根据级别的结果。该函数适用于使用plm 的一阶差分 (FD) 估计和固定效应 (FE) 估计。对于 FD，它会随着时间的推移产生不同的结果，而对于 FE，它会产生时间贬值的结果。

该功能很大程度上未经测试，可能仅适用于高度平衡的数据帧。

非常欢迎任何建议和更正。非常感谢帮助开发小型 R 包。

函数predict.out.plm

predict.out.plm<-function(
  estimate,
  formula,
  data,
  model="fd",
  pname="y",
  pindex=NULL,
  levelconstr=T
){
  # estimate=e.fe
  # formula=f
  # data=d
  # model="within"
  # pname="y"
  # pindex=NULL
  # levelconstr=T
  #get index of panel data
  if (is.null(pindex) && class(data)[1]=="pdata.frame") {
    pindex<-names(attributes(data)$index)
  } else {
    pindex<-names(data)[1:2]
  }
  if (class(data)[1]!="pdata.frame") { 
    data<-pdata.frame(data)
  }
  #model frame
  mf<-model.frame(formula,data=data)
  #model matrix - transformed data
  mn<-model.matrix(formula,mf,model)

  #define variable names
  y.t.hat<-paste0(pname,".t.hat")
  y.l.hat<-paste0(pname,".l.hat")
  y.l<-names(mf)[1]

  #transformed data of explanatory variables 
  #exclude variables that were droped in estimation
  n<-names(estimate$aliased[estimate$aliased==F])
  i<-match(n,colnames(mn))
  X<-mn[,i]

  #predict transformed outcome with X * beta
  # p<- X %*% coef(estimate)
  p<-crossprod(t(X),coef(estimate))
  colnames(p)<-y.t.hat

  if (levelconstr==T){
    #old dataset with original outcome
    od<-data.frame(
      attributes(mf)$index,
      data.frame(mf)[,1]
    )
    rownames(od)<-rownames(mf) #preserve row names from model.frame
    names(od)[3]<-y.l

    #merge old dataset with prediciton
    nd<-merge(
      od,
      p,
      by="row.names",
      all.x=T,
      sort=F
    )
    nd$Row.names<-as.integer(nd$Row.names)
    nd<-nd[order(nd$Row.names),]

    #construct predicted level outcome for FD estiamtions
    if (model=="fd"){
      #first observation from real data
      i<-which(is.na(nd[,y.t.hat]))
      nd[i,y.l.hat]<-NA
      nd[i,y.l.hat]<-nd[i,y.l]
      #fill values over all years
      ylist<-unique(nd[,pindex[2]])[-1]
      ylist<-as.integer(as.character(ylist))
      for (y in ylist){
        nd[nd[,pindex[2]]==y,y.l.hat]<-
          nd[nd[,pindex[2]]==(y-1),y.l.hat] + 
          nd[nd[,pindex[2]]==y,y.t.hat]
      }
    } 
    if (model=="within"){
      #group means of outcome
      gm<-aggregate(nd[, pname], list(nd[,pindex[1]]), mean)
      gl<-aggregate(nd[, pname], list(nd[,pindex[1]]), length)
      nd<-cbind(nd,groupmeans=rep(gm$x,gl$x))
      #predicted values + group means
      nd[,y.l.hat]<-nd[,y.t.hat] + nd[,"groupmeans"]
    } 
    if (model!="fd" && model!="within") {
      stop('funciton works only for FD and FE estimations')
    }
  }
  #results
  results<-p
  if (levelconstr==T){
    results<-list(results,nd)
    names(results)<-c("p","df")
  }
  return(results)
}

测试功能：

##packages
library(plm)

##test dataframe
#data structure
N<-4
G<-2
M<-5
d<-data.frame(
  id=rep(1:N,each=M),
  year=rep(1:M,N)+2000,
  gid=rep(1:G,each=M*2)
)
#explanatory variable
d[,"x"]=runif(N*M,0,1)
#outcome
d[,"y"] = 2 * d[,"x"] + runif(N*M,0,1)
#panel data frame
d<-pdata.frame(d,index=c("id","year"))

##new data frame for out of sample prediction
dn<-d
dn$x<-rnorm(nrow(dn),0,2)

##estimate
#formula
f<- pFormula(y ~ x + factor(year))
#fixed effects or first difffernce estimation
e<-plm(f,data=d,model="within",index=c("id","year"))
e<-plm(f,data=d,model="fd",index=c("id","year"))
summary(e)

##fitted values of estimation
#transformed outcome prediction 
predict(e)
c(pmodel.response(e)-residuals(e))
predict.out.plm(e,f,d,"fd")$p
# "level" outcome prediciton 
predict.out.plm(e,f,d,"fd")$df$y.l.hat
#both
predict.out.plm(e,f,d,"fd")

##out of sampel prediciton 
predict(e,newdata=d) 
predict(e,newdata=dn) 
# Error in crossprod(beta, t(X)) : non-conformable arguments
# if plm omits variables specified in the formula (e.g. one year in factor(year))
# it tries to multiply two matrices with different length of columns than regressors
# the new funciton avoids this and therefore is able to do out of sample predicitons
predict.out.plm(e,f,dn,"fd")

【讨论】：

【解决方案3】：

plm 现在有一个predict.plm() 函数，尽管它没有记录/导出。

另请注意，predict 适用于转换后的模型（即在进行了 within/between/fd 转换之后），而不是原始模型。我推测造成这种情况的原因是在面板数据框架中进行预测更加困难。事实上，你需要考虑你是否在预测：

现有个人的新时间段，您使用了个人-FE？然后，您可以将预测添加到现有的个人平均值中
新的时间段，新的个人？那么你需要弄清楚你要使用哪个个体的意思？
同样的，更复杂的是你使用随机效应模型，因为效应不容易推导出来

在下面的代码中，我说明了如何在现有样本上使用拟合值：

library(plm)
#> Loading required package: Formula
library(tidyverse)

data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
          data = Produc, index = c("state","year"))


## produce a dataset of prediction, added to the group means
Produc_means <- Produc %>% 
  mutate(y = log(gsp)) %>% 
  group_by(state) %>% 
  transmute(y_mean = mean(y),
            y = y, 
            year = year) %>% 
  ungroup() %>% 
  mutate(y_pred = predict(zz) + y_mean) %>% 
  select(-y_mean)

## plot it
Produc_means %>% 
  gather(type, value, y, y_pred) %>% 
  filter(state %in% toupper(state.name[1:5])) %>% 
  ggplot(aes(x = year, y = value, linetype = type))+
  geom_line() +
  facet_wrap(~state) +
  ggtitle("Visualising in-sample prediction, for 4 states")
#> Warning: attributes are not identical across measure variables;
#> they will be dropped

^{由reprex package (v0.2.1) 于 2018 年 11 月 20 日创建}

【讨论】：

【解决方案4】：

看起来有一个新包可以对包括 plm 在内的各种模型进行样本内预测

https://cran.r-project.org/web/packages/prediction/prediction.pdf

【讨论】：

【解决方案5】：

您可以通过residuals(reg_name) 计算残差。从这里，您可以从响应变量中减去它们并获得预测值。

【讨论】：