【问题标题】:How to melt data.frame with multiple groups and empty suffix如何融化具有多个组和空后缀的data.frame
【发布时间】:2021-05-12 09:12:20
【问题描述】:

我有一个data.frame,需要根据列的后缀将多个列融合在一起。因此,所有以“from”结尾的列都应该合并为一列,对于以“to”结尾的列和所有没有后缀的列都一样,这就是我的问题,因为如果不附加没有后缀的列名,我就无法融合“xxx”。问题是如何将空后缀与names_pattern 中的正则表达式匹配,或者是否有不重命名的不同解决方案?我也对data.table这个问题的解决方案感兴趣。

library(tibble)
library(magrittr)
library(tidyr)

data <-
  tibble::tribble(
  ~"abc", ~"abcfrom", ~"abcto", ~"def", ~"deffrom", ~"defto",
  1, "2019-05-16", NA, 0, NA, NA,
  1, "2020-01-01", "2020-10-15", 1, "2014-12-17", "2015-03-05",
  1, NA, NA, 1, "2015-01-01", NA
)

data %>% 
  dplyr::rename("abcxxx" = "abc", "defxxx" = "def") %>% 
  tidyr::pivot_longer(
    everything(),
    names_to = c("variable", ".value"),
    names_pattern = "(.+)(xxx|from|to)"
  )
# A tibble: 6 x 4
  variable   xxx from       to        
  <chr>    <dbl> <chr>      <chr>     
1 abc          1 2019-05-16 NA        
2 def          0 NA         NA        
3 abc          1 2020-01-01 2020-10-15
4 def          1 2014-12-17 2015-03-05
5 abc          1 NA         NA        
6 def          1 2015-01-01 NA

【问题讨论】:

    标签: r tidyr melt


    【解决方案1】:

    这是一个选项

    library(dplyr)
    library(tidyr)
    library(stringr)
    data %>% 
     rename_at(vars(names(.)[!str_detect(names(.), "(from|to)$")]),
            ~ str_c(., 'xxx')) %>% 
     tidyr::pivot_longer(
     everything(),
     names_to = c("variable", ".value"),
     names_pattern = "(.+)(xxx|from|to)"
    )
    

    -输出

    # A tibble: 6 x 4
    #  variable   xxx from       to        
    #  <chr>    <dbl> <chr>      <chr>     
    #1 abc          1 2019-05-16 <NA>      
    #2 def          0 <NA>       <NA>      
    #3 abc          1 2020-01-01 2020-10-15
    #4 def          1 2014-12-17 2015-03-05
    #5 abc          1 <NA>       <NA>      
    #6 def          1 2015-01-01 <NA>      
    

    【讨论】:

      【解决方案2】:

      这是data.table 方法。

      它并没有真正融化,而是根据列名的前三个相同字符分成(三)列的块。然后设置块的列名,再次将它们行绑定在一起..

      library( data.table )
      setDT( data )
      
      #assuming the first three characters define the group
      data.split <- split.default(data, gsub('(^.{3}).*$', '\\1', names(data)))
      # $abc
      #    abc    abcfrom      abcto
      # 1:   1 2019-05-16       <NA>
      # 2:   1 2020-01-01 2020-10-15
      # 3:   1       <NA>       <NA>
      #   
      # $def
      #    def    deffrom      defto
      # 1:   0       <NA>       <NA>
      # 2:   1 2014-12-17 2015-03-05
      # 3:   1 2015-01-01       <NA>
      
      #set column names (fixed here, but can also be variable if desired)
      data.split <- lapply( data.split, setnames, new = c("xxx", "from", "to") )
      
      #bind together
      DT <- rbindlist( data.split, use.names = TRUE, idcol = "variable" )
      #    variable xxx       from         to
      # 1:      abc   1 2019-05-16       <NA>
      # 2:      abc   1 2020-01-01 2020-10-15
      # 3:      abc   1       <NA>       <NA>
      # 4:      def   0       <NA>       <NA>
      # 5:      def   1 2014-12-17 2015-03-05
      # 6:      def   1 2015-01-01       <NA>
      

      【讨论】:

      • 有趣的方法。可悲的是,我不能假设该组具有固定数量的字符。一组可以有 3 个,但另一组可以有 5 个
      • 但是'from'和'to'总是一样的?
      • 如果是这样,您可以使用此行..data.split &lt;- split.default(data, gsub('from|to', '', names(data)))
      猜你喜欢
      • 2021-02-15
      • 2015-02-21
      • 2017-12-17
      • 2013-06-01
      • 2023-04-03
      • 1970-01-01
      • 1970-01-01
      • 2014-03-08
      相关资源
      最近更新 更多