【问题标题】:Data In Long and Wide Format, Need to Convert to Just Long in R长宽格式的数据,需要在 R 中转换为长格式
【发布时间】:2018-11-28 15:37:45
【问题描述】:

我正在使用宽格式的数据集。它看起来像:

ID week1 week2 week3 ... week12  
1   2     NA     NA  ...  NA  
1   NA    3      NA  ...  NA
1   NA    NA     3   ...  NA
...
1   NA    NA     NA  ...  4
2   4     NA     NA  ...  NA
2   NA    5      NA  ...  NA
2   NA    NA     3   ...  NA

我现在正努力将其转换为仅用于分析的长格式。我希望将其设置为:

ID week value
1   1    2
1   2    3
1   3    3
...
1   12   4
2   1    4
2   2    5
2   3    3

任何人都可以就在 R 中执行此操作提出任何建议吗?我已经尝试过 reshape2 和 dplyr/tidyr,但是当我选择 ID 变量时,我总是得到太多的观察结果。

【问题讨论】:

    标签: r dplyr tidyverse tidyr reshape2


    【解决方案1】:

    这个怎么样:

    library(dplyr)
    
    # small data sample
    df <- read.table(text = 'ID week1 week2 week3 week4  
    1   2     NA     NA    NA  
    1   NA    3      NA    NA
    1   NA    NA     3     NA
    1   NA    NA     NA    4
    2   4     NA     NA    NA
    2   NA    5      NA    NA
    2   NA    NA     3     NA', header = T)
    
    df %>% 
       data.table::melt(id.vars = 'ID') %>% 
       na.omit()
    

    【讨论】:

      【解决方案2】:

      1) 收集使用wide 在最后的注释 1 中重复显示,使用 gatherwide 转换为长格式,删除 NA 行和排序。

      library(dplyr)
      library(tidyr)
      
      wide %>%
        gather("week", "value", -ID) %>%
        drop_na %>%
        arrange(ID, week)
      

      给予:

        ID  week value
      1  1 week1     2
      2  1 week2     3
      3  1 week3     3
      4  1 week4     4
      5  2 week1     4
      6  2 week2     5
      7  2 week3     3
      

      2) 重塑 仅使用基础 R:

      varying <- list(value = 2:5)
      long <- na.omit(reshape(wide, dir = "long", timevar = "week", 
        varying = varying, v.names = names(varying)))[1:3]
      long[order(long$ID, long$week), ]
      

      给予:

          ID week value
      1.1  1    1     2
      2.2  1    2     3
      3.3  1    3     3
      4.4  1    4     4
      5.1  2    1     4
      6.2  2    2     5
      7.3  2    3     3
      

      3) data.table 使用 (2) 中的 varying,我们可以使用 data.table 中的 melt。请注意,我们可以指定 id.vars 或 measure.vars,但在 cmets 中声明我们可能希望将其推广到多个变量,而 measure.vars 方法可以推广。

      library(data.table)
      longDT <- na.omit(melt(as.data.table(wide), measure.vars = varying, 
        variable.name = "week"))
      setkey(longDT, ID, week)
      longDT
      

      给予:

         ID  week value
      1:  1 week1     2
      2:  1 week2     3
      3:  1 week3     3
      4:  1 week4     4
      5:  2 week1     4
      6:  2 week2     5
      7:  2 week3     3
      

      注 1

      以可重现形式使用的输入是:

      Lines <- "
      ID week1 week2 week3 week4
      1   2     NA     NA   NA  
      1   NA    3      NA   NA
      1   NA    NA     3    NA
      1   NA    NA     NA   4
      2   4     NA     NA   NA
      2   NA    5      NA   NA
      2   NA    NA     3    NA"
      wide <- read.table(text = Lines, header = TRUE)
      

      注2

      关于有多个变量 data.table 的melt 支持这一点。 假设我们有以下内容:

      Lines2 <- "
      ID week1var1 week1var2 week2var1 week2var2 week3var1 week3var2 week4var1 week4var2
      1 1 2 20 NA NA NA NA NA NA
      2 1 NA NA 3 30 NA NA NA NA
      3 1 NA NA NA NA 3 30 NA NA
      4 1 NA NA NA NA NA NA 4 40
      5 2 4 40 NA NA NA NA NA NA
      6 2 NA NA 5 50 NA NA NA NA
      7 2 NA NA NA NA 3 30 NA NA"
      wide2 <- read.table(text = Lines, header = TRUE)
      
      library(data.table)
      
      varying2 <- split(names(wide2)[-1], 
        sub("(.*\\d)(\\D.*)", "\\2", names(wide2)[-1]))
      
      longDT2 <- na.omit(melt(as.data.table(wide2), measure.vars = varying2, 
        variable.name = "week"))
      setkey(longDT2, ID, week)
      longDT2
      

      给予:

         ID week var1 var2
      1:  1    1    2   20
      2:  1    2    3   30
      3:  1    3    3   30
      4:  1    4    4   40
      5:  2    1    4   40
      6:  2    2    5   50
      7:  2    3    3   30
      

      【讨论】:

      • 效果很好,谢谢!有什么办法可以更进一步地包含多个变量,例如:ID week1var1 week1var2 week1var3 week2var1 week2var2 week2var3 to ID week var1 var2 var3
      猜你喜欢
      • 2023-02-25
      • 1970-01-01
      • 2021-11-16
      • 2013-10-22
      • 1970-01-01
      • 1970-01-01
      • 2020-09-18
      • 1970-01-01
      • 2015-07-18
      相关资源
      最近更新 更多