【问题标题】:Extract regression coefficients out of large list in R从R中的大列表中提取回归系数
【发布时间】:2019-08-08 21:06:48
【问题描述】:

我有一个包含大约 100 列的大型数据框,并按年拆分。我想将前一年的 x[i] 作为自变量,将下一年的 x[i] 作为因变量:xS = a0+ a1xP + e

我的代码如下所示:

     d1 <- structure(list(Date=c("2012-01-01", "2012-06-01",
                            "2013-01-01", "2013-06-01", "2014-01-01", "2014-06-01"),
                     x1=c(NA, NA, 17L, 29L, 27L, 10L), 
                     x2=c(30L, 19L, 22L, 20L, 11L,24L), 
                     x3=c(NA, 23L, 22L, 27L, 21L, 26L),
                     x4=c(30L, 28L, 23L,24L, 10L, 17L), 
                     x5=c(NA, NA, NA, 16L, 30L, 26L)),
                row.names=c(NA, 6L), class="data.frame")
                rownames(d1) <- d1[, "Date"]   
                d1 <- d1[,-1]


df2012 <- d1[1:2,]
df2013 <- d1[3:4,]
df2014 <- d1[4:5,]

condlm <- function(i){    
  if(sum(is.na(df2012[,i]))==dim(df2013)[1]) # ignore the columns     only containing NA's
    return()
  else
    lm.model <- lm(df2013[,i]~df2012[,i])
  summary(lm.model)
}

lms <- lapply(1:dim(df2013)[2], condlm)
lms


zzq <- sapply(lms, coef)
zzq <- do.call(rbind.data.frame, zzq)
zzq <- zzq[grepl("(Intercept)", rownames(zzq)) ,] 

编辑 2:

lms 给我以下输出:

[[1]]
NULL

[[2]]

Call:
lm(formula = df2013[, i] ~ df2012[, i])

Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  16.5455         NA      NA       NA
df2012[, i]   0.1818         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 1 and 0 DF,  p-value: NA


[[3]]

Call:
lm(formula = df2013[, i] ~ df2012[, i])

Residuals:
ALL 1 residuals are 0: no residual degrees of freedom!

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)
(Intercept)       27         NA      NA       NA
df2012[, i]       NA         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
  (1 observation deleted due to missingness)


[[4]]

Call:
lm(formula = df2013[, i] ~ df2012[, i])

Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)     38.0         NA      NA       NA
df2012[, i]     -0.5         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 1 and 0 DF,  p-value: NA


[[5]]
NULL

[[1]][[5]] 给了我NULL

有没有办法修改函数 condlm,给我一个 NA 而不是NULL? 最后,在使用zzq &lt;- zzq[grepl("(Intercept)", rownames(zzq)) ,] 提取截距后,我的数据框 zzq 应该如下所示:

             Estimate Std. Error t value Pr(>|t|) 
(Intercept)  NA              NaN     NaN      NaN
(Intercept)2 16.54545        NaN     NaN      NaN
(Intercept)3 27.00000        NaN     NaN      NaN
(Intercept)4 38.00000        NaN     NaN      NaN
(Intercept)5 NA              NaN     NaN      NaN

谢谢

【问题讨论】:

    标签: r list regression coefficients


    【解决方案1】:

    您可以通过以下修改获得std错误、p值等:

    condlm <- function(i){    
      if(sum(is.na(df2012[,i]))==dim(df2013)[1]) # ignore the columns     only containing NA's
        return()
      else
        lm.model <- lm(df2013[,i]~df2012[,i])
        summary(lm.model)
    }
    
    
    lms <- lapply(1:dim(df2013)[2], condlm)
    lms
    

    但是请注意,由于您的示例中当前数据的结构方式,您没有足够的数据来获取 std 的数值。错误等,因为您的模型拟合不足。

    例如,使用您的示例数据,我们将获得以下(部分输出)

    > lms
    [[1]]
    NULL
    
    [[2]]
    
    Call:
    lm(formula = df2013[, i] ~ df2012[, i])
    
    Residuals:
    ALL 2 residuals are 0: no residual degrees of freedom!
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)  16.5455         NA      NA       NA
    df2012[, i]   0.1818         NA      NA       NA
    
    Residual standard error: NaN on 0 degrees of freedom
    Multiple R-squared:      1, Adjusted R-squared:    NaN 
    F-statistic:   NaN on 1 and 0 DF,  p-value: NA
    

    【讨论】:

    • 完美运行,谢谢!此处的示例数据只是一个简单的可重现示例。我的原始数据集要大得多。知道如何获得所有 p 值和 t 值的平均值吗?
    • @Pogi93 :您能否详细说明“平均 p 值”和“平均 t 值”的含义? P 值是概率,因此您不能将它们加在一起并除以总数。 R 输出中的 t 值是检验统计量,您也不能只是将它们相加然后除以总数。
    • 我编辑了我的原始帖子。很抱歉造成误解。
    • @Pogi93 隔离拦截,你可以使用zzq[grepl("(Intercept)", rownames(zzq)) ,]
    • 谢谢!再次,我用一个后续问题更新了我的帖子。也许你也可以回答这个问题?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2012-02-19
    • 2022-01-18
    • 1970-01-01
    • 1970-01-01
    • 2014-12-16
    • 1970-01-01
    • 2021-01-02
    相关资源
    最近更新 更多