【问题标题】:Reshaping of data重塑数据
【发布时间】:2021-07-06 14:41:45
【问题描述】:

我被 R 中的数据重塑困住了,我希望有人能帮助我。 数据如下所示:

ID measurement biomarker_x biomarker_y
1 1 10 100
1 2 11 110
1 3 12 120
2 1 20 200
2 2 19 190
2 3 21 210

并且需要重新塑造成这样:

ID biomarker measurement1 measurement2 measurement3
1 x 10 11 12
1 y 100 110 120
2 x 20 19 21
2 y 200 190 210

我尝试使用 tidyr::gather 并传播和使用 pivot_widerpivot_longer 但失败了。 如果有人能找到将其应用于多个生物标志物的解决方案,我将不胜感激。

【问题讨论】:

    标签: r reshape tidyr


    【解决方案1】:

    只能在tidyr 中完成

    library(tidyr)
    
    df <- read.table(header = T, text = 'ID measurement biomarker_x biomarker_y
    1   1   10  100
    1   2   11  110
    1   3   12  120
    2   1   20  200
    2   2   19  190
    2   3   21  210')
    
    df %>% pivot_longer(starts_with('biomarker'), names_to = 'biomarker', names_prefix = 'biomarker_') %>%
      pivot_wider(names_from = measurement, values_from = value, names_prefix = 'measurement_')
    
    #> # A tibble: 4 x 5
    #>      ID biomarker measurement_1 measurement_2 measurement_3
    #>   <int> <chr>             <int>         <int>         <int>
    #> 1     1 x                    10            11            12
    #> 2     1 y                   100           110           120
    #> 3     2 x                    20            19            21
    #> 4     2 y                   200           190           210
    

    reprex package (v2.0.0) 于 2021-07-06 创建

    【讨论】:

    • 非常感谢@AnilGoyal。令人惊讶的是,这只花了你 2 行。
    【解决方案2】:

    使用来自reshape2recast

    library(reshape2)
    names(df1)[-(1:2)] <- sub("biomarker_", "", names(df1)[-(1:2)])
    reshape2::recast(df1, id.var = c("ID", "measurement"), 
         ID + variable ~ paste0('measurement', measurement), value.var = 'value')
    

    -输出

      ID variable measurement1 measurement2 measurement3
    1  1        x           10           11           12
    2  1        y          100          110          120
    3  2        x           20           19           21
    4  2        y          200          190          210
    

    数据

    df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), measurement = c(1L, 
    2L, 3L, 1L, 2L, 3L), biomarker_x = c(10L, 11L, 12L, 20L, 19L, 
    21L), biomarker_y = c(100L, 110L, 120L, 200L, 190L, 210L)), 
    class = "data.frame", row.names = c(NA, 
    -6L))
    

    【讨论】:

      【解决方案3】:

      这行得通吗:

      library(dplyr)
      library(tidyr)
      library(stringr)
      
      
      df %>% pivot_longer(-c(ID, measurement), names_to = 'biomarker') %>% mutate(biomarker = str_extract(biomarker, '[xy]$')) %>% 
        pivot_wider(c(ID, biomarker), names_from = measurement, names_prefix = 'measurement', values_from = value)
      # A tibble: 4 x 5
           ID biomarker measurement1 measurement2 measurement3
        <int> <chr>            <int>        <int>        <int>
      1     1 x                   10           11           12
      2     1 y                  100          110          120
      3     2 x                   20           19           21
      4     2 y                  200          190          210
      

      【讨论】:

      • 感谢您提供此解决方案。完美运行。
      【解决方案4】:

      这是一种方法。

      library(tidyverse)
      
      dat |> 
        pivot_longer(
          cols = starts_with("bio"),
          names_to = "biomarker"
        ) |> 
        mutate(biomarker = str_remove(biomarker, "biomarker_")) |> 
        pivot_wider(
          names_from = measurement,
          values_from = value,
          names_prefix = "measurement"
        )
      
      # # A tibble: 4 x 5
      #      ID biomarker measurement1 measurement2 measurement3
      #   <int> <chr>            <int>        <int>        <int>
      # 1     1 x                   10           11           12
      # 2     1 y                  100          110          120
      # 3     2 x                   20           19           21
      # 4     2 y                  200          190          210
      

      【讨论】:

        【解决方案5】:

        使用嵌套 ´reshape` 的纯基础 R 选项

        reshape(
            reshape(
                df,
                direction = "long",
                idvar = c("ID", "measurement"),
                varying = -(1:2),
                sep = "_"
            ),
            direction = "wide",
            idvar = c("ID", "time"),
            timevar = "measurement"
        )
        

        给予

              ID time biomarker.1 biomarker.2 biomarker.3
        1.1.x  1    x          10          11          12
        2.1.x  2    x          20          19          21
        1.1.y  1    y         100         110         120
        2.1.y  2    y         200         190         210
        

        【讨论】:

          猜你喜欢
          • 2013-01-27
          • 2019-11-15
          • 2013-07-21
          • 2015-10-07
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多