【问题标题】:pivot longer with multiple columns and values使用多个列和值进行更长的旋转
【发布时间】:2020-04-23 08:45:01
【问题描述】:

我有一个格式的数据框

# A tibble: 6 x 8
  type  id    age2000 age2001 age2002 bool2000 bool2001 bool2002
  <chr> <chr> <chr>   <chr>   <chr>   <chr>    <chr>    <chr>   
1 1     1     20      21      22      1        1        1       
2 1     2     35      36      37      2        2        2       
3 1     3     24      25      26      1        1        1       
4 2     1     32      33      34      2        2        2       
5 2     2     66      67      68      2        2        2       
6 2     3     14      15      16      1        1        1       

并希望使用 tidyr 宇宙中的 pivot_longer 来生成以下形式的纵向数据:

# A tibble: 18 x 5
   type  id    age   bool  year 
   <chr> <chr> <chr> <chr> <chr>
 1 1     1     20    1     2000 
 2 1     1     21    1     2001 
 3 1     1     22    1     2002 
 4 1     2     35    2     2000 
 5 1     2     36    2     2001 
 6 1     2     37    2     2002 
 7 1     3     24    1     2000 
 8 1     3     25    1     2001 
 9 1     3     26    1     2002 
10 2     1     32    2     2000 
11 2     1     33    2     2001 
12 2     1     34    2     2002 
13 2     2     66    2     2000 
14 2     2     67    2     2001 
15 2     2     68    2     2002 
16 2     3     14    1     2000 
17 2     3     15    1     2001 
18 2     3     16    1     2002

这里有人知道我面临的这个问题的解决方案吗?

非常感谢您的任何建议!

【问题讨论】:

    标签: r pivot-table tidyr


    【解决方案1】:

    您可以在此处使用names_pattern

    tidyr::pivot_longer(df, 
                        cols = -c(id, type), 
                        names_to = c('.value', 'year'),
                        names_pattern = '([a-z]+)(\\d+)')
    
    
    # A tibble: 18 x 5
    #    type    id year    age  bool
    # * <int> <int> <chr> <int> <int>
    # 1     1     1 2000     20     1
    # 2     1     1 2001     21     1
    # 3     1     1 2002     22     1
    # 4     1     2 2000     35     2
    # 5     1     2 2001     36     2
    # 6     1     2 2002     37     2
    # 7     1     3 2000     24     1
    # 8     1     3 2001     25     1
    # 9     1     3 2002     26     1
    #10     2     1 2000     32     2
    #11     2     1 2001     33     2
    #12     2     1 2002     34     2
    #13     2     2 2000     66     2
    #14     2     2 2001     67     2
    #15     2     2 2002     68     2
    #16     2     3 2000     14     1
    #17     2     3 2001     15     1
    #18     2     3 2002     16     1
    

    数据

    df <- structure(list(type = c(1L, 1L, 1L, 2L, 2L, 2L), id = c(1L, 2L, 
    3L, 1L, 2L, 3L), age2000 = c(20L, 35L, 24L, 32L, 66L, 14L), age2001 = c(21L, 
    36L, 25L, 33L, 67L, 15L), age2002 = c(22L, 37L, 26L, 34L, 68L, 
    16L), bool2000 = c(1L, 2L, 1L, 2L, 2L, 1L), bool2001 = c(1L, 
    2L, 1L, 2L, 2L, 1L), bool2002 = c(1L, 2L, 1L, 2L, 2L, 1L)), 
    class = "data.frame", row.names = c(NA, -6L))
    

    【讨论】:

    • 感谢您的快速回复。你为我节省了很多时间 :) 它有效
    【解决方案2】:

    我们可以在names_sep 中传递正则表达式环视

    library(dplyr)
    library(tidyr)
    df %>%
        pivot_longer(cols = -c(id, type), names_to = c('.value', 'year'),
              names_sep= "(?<=[a-z])(?=[0-9])")
    # A tibble: 18 x 5
    #    type    id year    age  bool
    #   <int> <int> <chr> <int> <int>
    # 1     1     1 2000     20     1
    # 2     1     1 2001     21     1
    # 3     1     1 2002     22     1
    # 4     1     2 2000     35     2
    # 5     1     2 2001     36     2
    # 6     1     2 2002     37     2
    # 7     1     3 2000     24     1
    # 8     1     3 2001     25     1
    # 9     1     3 2002     26     1
    #10     2     1 2000     32     2
    #11     2     1 2001     33     2
    #12     2     1 2002     34     2
    #13     2     2 2000     66     2
    #14     2     2 2001     67     2
    #15     2     2 2002     68     2
    #16     2     3 2000     14     1
    #17     2     3 2001     15     1
    #18     2     3 2002     16     1
    

    数据

    df <- structure(list(type = c(1L, 1L, 1L, 2L, 2L, 2L), id = c(1L, 2L, 
    3L, 1L, 2L, 3L), age2000 = c(20L, 35L, 24L, 32L, 66L, 14L), age2001 = c(21L, 
    36L, 25L, 33L, 67L, 15L), age2002 = c(22L, 37L, 26L, 34L, 68L, 
    16L), bool2000 = c(1L, 2L, 1L, 2L, 2L, 1L), bool2001 = c(1L, 
    2L, 1L, 2L, 2L, 1L), bool2002 = c(1L, 2L, 1L, 2L, 2L, 1L)), 
    class = "data.frame", row.names = c(NA, -6L))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2020-06-12
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多