【问题标题】:R: combine all strings in group in one rowR:将组中的所有字符串组合成一行
【发布时间】:2021-11-11 12:52:02
【问题描述】:

我有以下数据框:

               Date          Day    Element    Variable    Failure
2021-03-01 08:00:00   2021-03-01          A     Current          1
2021-03-01 08:00:00   2021-03-01          A     Voltage          1
2021-03-01 08:10:00   2021-03-01          A     Current          0
2021-03-01 08:10:00   2021-03-01          A     Voltage          0
2021-03-01 08:20:00   2021-03-01          A     Current          1
2021-03-01 08:20:00   2021-03-01          A     Voltage          1
2021-03-02 08:00:00   2021-03-02          B     Current          1
2021-03-02 08:00:00   2021-03-02          B     Voltage          0
2021-03-02 08:10:00   2021-03-02          B     Current          1
2021-03-02 08:10:00   2021-03-02          B     Voltage          0
2021-03-02 08:20:00   2021-03-02          B     Current          1
2021-03-02 08:20:00   2021-03-02          B     Voltage          0

在按DayElement 分组后,如果每组中至少有一行有Failure == 1,我想生成一个列,将Variable 列中的字符串组合成逗号分隔的字符串。所以我正在寻找这个输出:

       Day    Element        Combination
2021-03-01          A    Current,Voltage
2021-03-02          B            Current

【问题讨论】:

    标签: r string group-by


    【解决方案1】:

    也许你可以试试aggregate,如下所示

    aggregate(
      variable ~ Day + Element,
      subset(
        df,
        Failure == 1
      ),
      function(x) toString(unique(x))
    )
    

    【讨论】:

    • 工作得几乎完美,但它复制了具有相同名称的列“元素”,所以如果我这样做“df = df %>% dplyr::select(-Element)”,两列都被删除.知道如何只删除一个吗?
    • 好吧,我用 'df = df[ -c(2) ]' 得到它,是重复列之一的索引 2
    【解决方案2】:

    我们可以使用tidyr的函数pivot_widerunite

    df %>% group_by(date, Element, Variable) %>%
            summarise(Failure=max(Failure)) %>%
            pivot_wider(names_from=Variable, values_from = Failure) %>%
            mutate(across(Current:Voltage, ~ifelse(.x, deparse(substitute(.x)), NA))) %>%
            tidyr::unite(col='Combination', Current:Voltage, na.rm = TRUE, sep=", ")
    
    # A tibble: 2 x 3
    # Groups:   date, Element [2]
      date       Element Combination     
      <date>     <chr>   <chr>           
    1 2021-03-01 A       Current, Voltage
    2 2021-03-02 B       Current  
    

    数据

    df<-tibble(date=as.Date(c("2021-03-01","2021-03-01", "2021-03-01", "2021-03-01", "2021-03-02", "2021-03-02", "2021-03-02", "2021-03-02")),
               Element=c('A', 'A', 'A',"A","B", 'B', 'B', 'B'),
               Variable=rep(c('Current', 'Voltage'), 4),
               Failure = c(1,1,0,0,1,0,0,0))
    
    # A tibble: 8 x 4
      date       Element Variable Failure
      <date>     <chr>   <chr>      <dbl>
    1 2021-03-01 A       Current        1
    2 2021-03-01 A       Voltage        1
    3 2021-03-01 A       Current        0
    4 2021-03-01 A       Voltage        0
    5 2021-03-02 B       Current        1
    6 2021-03-02 B       Voltage        0
    7 2021-03-02 B       Current        0
    8 2021-03-02 B       Voltage        0
    

    编辑 一种更简单的方法:

    df %>% mutate(Combined=(ifelse(Failure==1, Variable, NA))) %>%
            group_by(date, Element) %>%
            summarise(Combined=paste(unique(na.omit(Combined)), collapse = ', '))%>%
            ungroup()
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-08-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-08-18
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多