在 R 中聚合回归输出答案

【问题标题】：aggregating regression outputs in R在 R 中聚合回归输出
【发布时间】：2014-10-02 17:13:11
【问题描述】：

我正在使用循环函数执行多个汇集横截面回归，并将回归输出存储在列表中（回归）。我现在想做的是有效地获得平均系数、平均 t-stats 以及平均 adj.r 平方。

我已经提交了以下代码：

library(plm)
data("Grunfeld", package="plm")

# create list with regression outputs
regression <- list()

# Regression on past six-year subsets of Grunfeld in every year from 1940 to 1950
for(t in 1940:1950){

  regression[[as.character(t)]] <- lm(inv ~ value + capital, 
                              subset(Grunfeld, year<=t & year>=t-5))
}

通过这种方式，我获得了存储在列表中的所需回归输出（回归）。我现在想做的是有效地获得平均系数、平均 t-stats 以及平均 adj.r 平方。

我已经尝试计算所有 adj 的平均值。 r 平方：

mean(lapply(regression, function(x) summary(x)$adj.r.squared))

但是，当我收到以下错误时，似乎我使用了错误的平均函数。

Warning message:
In mean.default(lapply(regression, function(x) summary(x)$adj.r.squared)) :
  argument is not numeric or logical: returning NA

我还想出了以下方法来“提取”系数。

lapply(regression, function(x) summary(x)$coefficients)

如何从这个 lapply 输出中快速获得平均单个系数？（即单独提取每一行并计算多年来的各自平均值。）

$`1940`
               Estimate   Std. Error    t value     Pr(>|t|)
(Intercept) -3.65239712 14.647050149 -0.2493606 8.039783e-01
value        0.08283141  0.006873563 12.0507230 2.615793e-17
capital      0.11033307  0.091543522  1.2052526 2.330857e-01

$`1941`
                Estimate   Std. Error   t value     Pr(>|t|)
(Intercept) -13.77258038 16.677399231 -0.825823 4.123477e-01
value         0.08614094  0.007258571 11.867480 4.904857e-17
capital       0.18680229  0.094849038  1.969470 5.376624e-02

....

【问题讨论】：

您收到一条错误消息，因为lapply 返回一个列表，然后您将mean 函数应用于此列表。请改用sapply，如@landroni 所示。
相关：stackoverflow.com/questions/24820800/…

标签： r regression lapply sapply

【解决方案1】：

你几乎是对的！试试这个：

> sapply(regression, function(x) mean(summary(x)$adj.r.squared))
     1940      1941      1942      1943      1944      1945      1946      1947      1948      1949 
0.7230061 0.7293396 0.7399216 0.7770505 0.7998859 0.8413422 0.8571037 0.8561229 0.8348950 0.8357761 
     1950 
0.8324654

你也可以在上面使用lapply()。一旦您确定如何从系数表中提取它们，就可以对任何系数进行相同的 t 检验。

要提取value 的系数，您可以执行以下操作：

lapply(regression, function(x) summary(x)$coefficients[ rownames(summary(x)$coefficients)=="value", ])

更紧凑的版本是：

sapply(regression, function(x) summary(x)$coefficients[ rownames(summary(x)$coefficients)=="value", ])

从上面你可以得到如下手段：

> (x <- t(sapply(regression, function(x) summary(x)$coefficients[ rownames(summary(x)$coefficients)=="value", ])))
       Estimate  Std. Error  t value     Pr(>|t|)
1940 0.08283141 0.006873563 12.05072 2.615793e-17
1941 0.08614094 0.007258571 11.86748 4.904857e-17
1942 0.09018745 0.007711639 11.69498 8.898811e-17
1943 0.09945565 0.007751087 12.83119 1.886416e-18
1944 0.10568804 0.007376523 14.32762 1.516617e-20
1945 0.11358598 0.006722166 16.89723 7.314450e-24
1946 0.12227763 0.006781509 18.03104 3.203995e-25
1947 0.12599497 0.007199027 17.50167 1.356383e-24
1948 0.12605599 0.008005481 15.74621 2.030259e-22
1949 0.12951740 0.008452725 15.32256 7.175275e-22
1950 0.13647072 0.009530406 14.31951 1.555615e-20
> colMeans(x)
    Estimate   Std. Error      t value     Pr(>|t|) 
1.107460e-01 7.605700e-03 1.459911e+01 1.510115e-17

也就是说，这看起来很像 Fama-MacBeth 的估计：Fama MacBeth standard errors in R。这些可以通过plm 中的pmg() 轻松获得。

【讨论】：

感谢您的帮助。 “提取”正是我正在努力解决的问题。我想要做的是非常灵活地编码它，因为您可以使用不同的模型（具有不同的模型参数）并单独提取每一行（对于相应的变量）。

【解决方案2】：

你可以试试：

 library(reshape2)
  dcast(melt(lapply(regression, 
       function(x) summary(x)$coefficients)), Var1~Var2, value.var="value", mean)
 #         Var1    Estimate Std. Error   t value     Pr(>|t|)
 #1 (Intercept) -16.7072859 16.0876958 -1.029145 3.320868e-01
 #2       value   0.1107460  0.0076057 14.599109 1.510115e-17
 #3     capital   0.1279743  0.0685896  1.833861 9.389504e-02

或者

 Reduce(`+`,lapply(regression, function(x) summary(x)$coefficients))/length(regression)
 #                    Estimate Std. Error   t value     Pr(>|t|)
 #(Intercept) -16.7072859 16.0876958 -1.029145 3.320868e-01
 #value         0.1107460  0.0076057 14.599109 1.510115e-17
 #capital       0.1279743  0.0685896  1.833861 9.389504e-02

【讨论】：