可以运行 plm 固定效应模型并添加因子虚拟变量（树方式固定效应）吗？答案

【问题标题】：Is it ok to run a plm fixed effect model and add a factor dummy variable (tree way fixed effects)?可以运行 plm 固定效应模型并添加因子虚拟变量（树方式固定效应）吗？
【发布时间】：2022-01-25 17:42:38
【问题描述】：

是否可以运行“plm”固定效应模型并在 R 中添加一个因子虚拟变量，如下所示？

“时间”、“公司”和“国家”这三个因素都是我想一起修复的独立索引。

我发现下面的规范更适合我的情况，而不是通过组合“公司”和“国家”来创建两个索引。

这是一种可接受的格式吗？

plm(y ~ lag(x1, 1) + x2 + x3 + x4 + x5 + factor(Country), data=DATA,
    index=c("Firm","Time"), model="within")

【问题讨论】：

看来您混淆了plm 中的索引应该是id-time，即index=c('id', 'time')。
@jay.sf 即使我的 DATA 列序列是时间然后 id，序列是否重要？
是的，当然，plm 在那里很挑剔。
@jay.sf：谢谢。
@jay.sf：谢谢。我也将此问题作为统计问题放在以下线程中。请看：stats.stackexchange.com/questions/561731/…

标签： r regression dummy-variable panel-data plm

【解决方案1】：

可以添加其他因素。我们可以通过计算 LSDV 模型来证明这一点。作为初步说明，您当然需要稳健的标准误差，通常聚集在 highest aggregate level，即本例中的国家/地区。

注意： R >= 4.1 用于以下。

LSDV

fit1 <- 
  lm(y ~ d + x1 + x2 + x3 + x4 + factor(id) + factor(time) + factor(country), 
     dat)
lmtest::coeftest(
  fit1, vcov.=sandwich::vcovCL(fit1, cluster=dat$country, type='HC0')) |>
  {\(.) .[!grepl('\\(|factor', rownames(.)), ]}()
#      Estimate Std. Error    t value      Pr(>|t|)
# d  10.1398727  0.3181993 31.8664223 4.518874e-191
# x1  1.1217514  1.6509390  0.6794627  4.968995e-01
# x2  3.4913273  2.7782157  1.2566797  2.089718e-01
# x3  0.6257981  3.3162148  0.1887085  8.503346e-01
# x4  0.1942742  0.8998307  0.2159008  8.290804e-01

添加factor(country) 后，我们用plm::plm 得到的估计量与LSDV 相同：

`plm::plm`

fit2 <- plm::plm(y ~ d + x1 + x2 + x3 + x4 + factor(country), 
                 index=c('id', 'time'), model='within', effect='twoways', dat)
summary(fit2, vcov=plm::vcovHC(fit2, cluster='group', type='HC1'))$coe
#      Estimate Std. Error    t-value      Pr(>|t|)
# d  10.1398727  0.3232850 31.3651179 5.836597e-186
# x1  1.1217514  1.9440165  0.5770277  5.639660e-01
# x2  3.4913273  3.2646905  1.0694206  2.849701e-01
# x3  0.6257981  3.1189939  0.2006410  8.409935e-01
# x4  0.1942742  0.9250759  0.2100089  8.336756e-01

但是，cluster='group' 将引用"id" 而不是"country"，因此标准错误是错误的。似乎目前不可能通过 plm 的附加因子进行聚类，至少我什么都不知道。

您也可以使用lfe::felm 来避免相对于 LSDV 大大减少计算时间：

`lfe::felm`

summary(lfe::felm(y ~ d + x1 + x2 + x3 + x4 | id + time + country | 0 | country,
                  dat))$coe
#      Estimate Cluster s.e.    t value     Pr(>|t|)
# d  10.1398727    0.3184067 31.8456637 1.826374e-33
# x1  1.1217514    1.6520151  0.6790201 5.004554e-01
# x2  3.4913273    2.7800267  1.2558611 2.153737e-01
# x3  0.6257981    3.3183765  0.1885856 8.512296e-01
# x4  0.1942742    0.9004173  0.2157602 8.301083e-01

作为比较，这是 Stata 计算的，标准误与 LSDV 和 lfe::felm 的标准误差非常相似：

状态

. reghdfe y d x1 x2 x3 x4, absorb (country time id) vce(cluster country) 

           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           d |   10.13987   .3185313    31.83   0.000      9.49907    10.78068
          x1 |   1.121751   1.652662     0.68   0.501    -2.202975    4.446478
          x2 |   3.491327   2.781115     1.26   0.216    -2.103554    9.086209
          x3 |   .6257981   3.319675     0.19   0.851    -6.052528    7.304124
          x4 |   .1942742   .9007698     0.22   0.830    -1.617841    2.006389
       _cons |   14.26801   23.65769     0.60   0.549    -33.32511    61.86114

模拟面板数据：

n1 <- 20; t1 <- 4; n2 <- 48
dat <- expand.grid(id=1:n1, time=1:t1, country=1:n2)
set.seed(42)
dat <- within(dat, {
  id <- as.vector(apply(matrix(1:(n1*n2), n1), 2, rep, t1))
  d <- runif(nrow(dat), 70, 80)
  x1 <- sample(0:1, nrow(dat), replace=TRUE)
  x2 <- runif(nrow(dat))
  x3 <- runif(nrow(dat))
  x4 <- rnorm(nrow(dat))
  y <-
    10*d +  ## treatment effect
    as.vector(replicate(n2, rep(runif(n1, 2, 5), t1))) +  ## id FE
    rep(runif(n1, 10, 12), each=t1) +  ## time FE
    rep(runif(n2, 10, 12), each=n1*t1) +  ## country FE
    - .7*x1 + 1.3*x2 + 2.4*x3 +
    .5 * x4 + rnorm(nrow(dat), 0, 50)
})
readstata13::save.dta13(dat, 'panel.dta')  ## for Stata

【讨论】：

谢谢。是否需要双向生效？由于运行时间很长，我没有指定任何内容。此外，我已将我的问题更新为将 ID 更改为 Firm，希望您的回答仍然有效。除了 index=(“Time”, “Firm”, “Country”) 看起来很尴尬而且不能正常工作，所以我不得不按照上面的方式做。
@Eric 您想为您认为存在异质性的每个聚合添加单位固定效应，例如对于 ID（或公司）和国家/地区，以及时间固定效应以说明时间趋势。
我不知道有更简单的方法来解释这一点，但我只是想要同时对时间、公司和国家产生固定效果。
@Eric 那么你现在应该有你需要的解决方案了:)
谢谢，但是从您给出的答案中，您说在 summary 命令中编写的 plm 集群可能无法按我的意愿工作。这是否意味着我根本无法使用 plm？如果我不使用摘要的聚类命令，而只是运行摘要以用作我的结果怎么办？