计算异方差的有效方法R中的稳健标准误差答案

【问题标题】：Efficient way to compute Heteroscedasticity Robust standard errors in R计算异方差的有效方法R中的稳健标准误差
【发布时间】：2017-02-02 10:16:07
【问题描述】：

我正在尝试计算 R 中的稳健标准误差。我知道有两种解决方案可以满足我的需求，但速度非常慢。因此，我的问题是是否有一种更有效的方法。例如。已经在 Rcpp 中编码的东西。

我的上下文是我正在拟合具有大量变量（固定效应）的模型。然而我对这些系数不感兴趣，我只关心推断一个单一的系数（下例中的 X）。

快速解决方案

???

慢速解决方案 1

library(sandwich)
lmfe<-lm(Y ~ X + factor(strata_ids))
coeftest(lmfe, vcov = vcovHC(lmfe, "HC1"))

慢速解决方案 2

我从here得到的手动解决方案是：

summaryw <- function(model) {
  s <- summary(model)
  X <- model.matrix(model)
  u2 <- residuals(model)^2
  XDX <- 0

  ## Here one needs to calculate X'DX. But due to the fact that
  ## D is huge (NxN), it is better to do it with a cycle.
  for(i in 1:nrow(X)) {
    XDX <- XDX + u2[i]*X[i,]%*%t(X[i,])
  }

  # inverse(X'X)
  XX1 <- solve(t(X)%*%X)

  # Variance calculation (Bread x meat x Bread)
  varcovar <- XX1 %*% XDX %*% XX1

  # degrees of freedom adjustment
  dfc <- sqrt(nrow(X))/sqrt(nrow(X)-ncol(X))

  # Standard errors of the coefficient estimates are the
  # square roots of the diagonal elements
  stdh <- dfc*sqrt(diag(varcovar))

  t <- model$coefficients/stdh
  p <- 2*pnorm(-abs(t))
  results <- cbind(model$coefficients, stdh, t, p)
  dimnames(results) <- dimnames(s$coefficients)
  results
}

【问题讨论】：

我将分层 ID 建模为随机效应。您似乎拥有已经开发出混合效应模型的那种教科书示例。
谢谢，但我有充分的理由使用固定效果
好吧，那么你就会对一个巨大的设计矩阵进行过度拟合。这很慢。你也许可以让它更快，但要小心忽略对奇点等的检查。
您还可以查看 lfe 包，它是为处理大量固定效果而构建的。

标签： r statistics standards

【解决方案1】：

这个问题已经有一个很好的答案（即使用lfe::felm()）。

如需更快的方法，请尝试新的fixest 包。以 OP 为例，

library(fixest)
mod = feols(Y ~ X | strata_ids, data = dat)

## SEs are automatically clustered by the strata_ids FE
mod

## We can compute other SEs on the fly with summary.fixest(), e.g.
summary(mod, se = 'standard') ## vanilla
summary(mod, se = 'white') ## HC
# etc

更普遍的教训是避免将固定效应建模为 R... 或任何其他语言 TBH 中的因子。这相当于 DV 方法，并且总是很慢。相反，您需要使用专门构建的包，该包利用 FWL 或其他一些优化的估计方法。

【讨论】：