【问题标题】:Correlation matrix for linear model regression coefficient线性模型回归系数的相关矩阵
【发布时间】:2018-10-28 07:24:06
【问题描述】:

使用cor(mtcars, method='pearson') 生成一个矩阵,显示mtcars 中的所有变量与mtcars 中的所有其他变量的皮尔逊相关性。例如:

head(cor(mtcars, method='pearson'))
            mpg        cyl       disp         hp       drat         wt        qsec         vs         am       gear
mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684  0.6811719 -0.8676594  0.41868403  0.6640389  0.5998324  0.4802848
cyl  -0.8521620  1.0000000  0.9020329  0.8324475 -0.6999381  0.7824958 -0.59124207 -0.8108118 -0.5226070 -0.4926866
disp -0.8475514  0.9020329  1.0000000  0.7909486 -0.7102139  0.8879799 -0.43369788 -0.7104159 -0.5912270 -0.5555692
hp   -0.7761684  0.8324475  0.7909486  1.0000000 -0.4487591  0.6587479 -0.70822339 -0.7230967 -0.2432043 -0.1257043
drat  0.6811719 -0.6999381 -0.7102139 -0.4487591  1.0000000 -0.7124406  0.09120476  0.4402785  0.7127111  0.6996101
wt   -0.8676594  0.7824958  0.8879799  0.6587479 -0.7124406  1.0000000 -0.17471588 -0.5549157 -0.6924953 -0.5832870
           carb
mpg  -0.5509251
cyl   0.5269883
disp  0.3949769
hp    0.7498125
drat -0.0907898
wt    0.4276059

我怎样才能得到上面相同的矩阵,除了每个值不是每个变量之间的皮尔逊相关性,而是来自线性模型的r.squared 值?因此,例如第一列、第二行将与summary(lm(mtcars$mpg~ mtcars$cyl))$r.squared 相同。谢谢

【问题讨论】:

    标签: r matrix linear-regression correlation


    【解决方案1】:
    library(tidyverse)
    
    # kepp names of dataset
    names = names(mtcars)
    
    expand.grid(names, names, stringsAsFactors = F) %>%  # create pairs of names
      filter(Var1 != Var2) %>%                           # exclude same variables (creates warnings)
      rowwise() %>%                                      # for each row
      mutate(r = summary(lm(paste(Var1, "~" ,Var2), data = mtcars))$r.squared) %>%  # get the r squared
      spread(Var2, r)                                    # reshape
    
    # # A tibble: 11 x 12
    # Var1        am     carb    cyl   disp     drat    gear      hp    mpg
    # <chr>    <dbl>    <dbl>  <dbl>  <dbl>    <dbl>   <dbl>   <dbl>  <dbl>
    # 1 am    NA        0.00331  0.273  0.350  0.508    0.631   0.0591  0.360
    # 2 carb   0.00331 NA        0.278  0.156  0.00824  0.0751  0.562   0.304
    # 3 cyl    0.273    0.278   NA      0.814  0.490    0.243   0.693   0.726
    # 4 disp   0.350    0.156    0.814 NA      0.504    0.309   0.626   0.718
    # 5 drat   0.508    0.00824  0.490  0.504 NA        0.489   0.201   0.464
    # 6 gear   0.631    0.0751   0.243  0.309  0.489   NA       0.0158  0.231
    # 7 hp     0.0591   0.562    0.693  0.626  0.201    0.0158 NA       0.602
    # 8 mpg    0.360    0.304    0.726  0.718  0.464    0.231   0.602  NA    
    # 9 qsec   0.0528   0.431    0.350  0.188  0.00832  0.0452  0.502   0.175
    # 10 vs     0.0283   0.324    0.657  0.505  0.194    0.0424  0.523   0.441
    # 11 wt     0.480    0.183    0.612  0.789  0.508    0.340   0.434   0.753
    # # ... with 3 more variables: qsec <dbl>, vs <dbl>, wt <dbl>
    

    如果您想使用行名而不是​​第一列 (Var1),您可以在上面的管道末尾添加

    ... %>%
      data.frame() %>%
      column_to_rownames("Var1")
    

    这将更接近您从 cor(mtcars, method='pearson') 获得的输出

    【讨论】:

      【解决方案2】:

      我创建了一个 corlm 函数,它用 for 循环填充条目

      corlm <- function(df){
      mat <- matrix(NA, ncol(df), ncol(df), dimnames = list(colnames(df),colnames(df)))
      suppressWarnings(for(i in 1:ncol(df)){
          for(j in 1:ncol(df)){
            mat[i,j] = summary(lm(df[,j]  ~ df[,i]))$r.squared}})
      diag(mat) = NA; return(mat)
      }
      
      round(corlm(mtcars),3)
             mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
      mpg     NA 0.726 0.718 0.602 0.464 0.753 0.175 0.441 0.360 0.231 0.304
      cyl  0.726    NA 0.814 0.693 0.490 0.612 0.350 0.657 0.273 0.243 0.278
      disp 0.718 0.814    NA 0.626 0.504 0.789 0.188 0.505 0.350 0.309 0.156
      hp   0.602 0.693 0.626    NA 0.201 0.434 0.502 0.523 0.059 0.016 0.562
      drat 0.464 0.490 0.504 0.201    NA 0.508 0.008 0.194 0.508 0.489 0.008
      wt   0.753 0.612 0.789 0.434 0.508    NA 0.031 0.308 0.480 0.340 0.183
      qsec 0.175 0.350 0.188 0.502 0.008 0.031    NA 0.554 0.053 0.045 0.431
      vs   0.441 0.657 0.505 0.523 0.194 0.308 0.554    NA 0.028 0.042 0.324
      am   0.360 0.273 0.350 0.059 0.508 0.480 0.053 0.028    NA 0.631 0.003
      gear 0.231 0.243 0.309 0.016 0.489 0.340 0.045 0.042 0.631    NA 0.075
      carb 0.304 0.278 0.156 0.562 0.008 0.183 0.431 0.324 0.003 0.075    NA
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2020-10-20
        • 2016-04-23
        • 1970-01-01
        • 1970-01-01
        • 2013-01-08
        • 2014-10-11
        • 2013-10-14
        • 2018-07-12
        相关资源
        最近更新 更多