【问题标题】:Paste string when NA in another column当 NA 在另一列中时粘贴字符串
【发布时间】:2023-03-07 02:11:01
【问题描述】:

这是我的数据示例:

my_df <- data.frame("R" = c("123", NA, NA, "456", "789", "123", NA),
                "D" = c("abc", "def", "ghi", "jkl", "mno", "aze", "aze"),
                stringsAsFactors = FALSE)

我想做的是,每当我在“R”列中有 NA 时,我想将内容粘贴到“R”不是 NA 的最后一行的“D”列中。

这是预期的结果:

my_result <- data.frame("R" = c("123", "456", "789", "123"),
                    "D" = c("abcdefghi", "ijk", "lmn", "azeaze"),
                    stringsAsFactors = FALSE)

【问题讨论】:

    标签: r dataframe paste


    【解决方案1】:

    split my_df$D by cumsum(!is.na(my_df$R)) 之后,您可以在sapply 中使用paste

    i <- !is.na(my_df$R)
    data.frame(my_df["R"][i,,drop=FALSE]
             , D = sapply(split(my_df$D, cumsum(i)), paste, collapse = ""))
    #    R         D
    #1 123 abcdefghi
    #4 456       jkl
    #5 789       mno
    #6 123    azeaze
    

    【讨论】:

      【解决方案2】:

      tidyverse

      my_df <- data.frame("R" = c("123", NA, NA, "456", "789", "123", NA),
                          "D" = c("abc", "def", "ghi", "jkl", "mno", "aze", "aze"),
                          stringsAsFactors = FALSE)
      
      
      library(tidyverse)
      my_df %>%
        mutate(grp = cumsum(!is.na(R))) %>% 
        fill(R) %>%
        group_by(R, grp) %>%
        summarise(D = paste0(D, collapse = ""), .groups = "drop") %>% 
        arrange(grp) %>% 
        select(-grp)
      
      #> # A tibble: 4 x 2
      #>   R     D        
      #>   <chr> <chr>    
      #> 1 123   abcdefghi
      #> 2 456   jkl      
      #> 3 789   mno      
      #> 4 123   azeaze
      

      reprex package 创建于 2021-12-07 (v2.0.1)

      data.table

      library(data.table)
      library(magrittr)
      
      setDT(my_df)[, grp := cumsum(!is.na(R))] %>% 
        .[, R := zoo::na.locf(R)] %>% 
        .[, list(D = paste0(D, collapse = "")), by = list(R, grp)] %>% 
        .[, grp := NULL] %>% 
        .[]
      
      #>      R         D
      #> 1: 123 abcdefghi
      #> 2: 456       jkl
      #> 3: 789       mno
      #> 4: 123    azeaze
      

      reprex package (v2.0.1) 于 2021-12-07 创建

      【讨论】:

      • 我想过,但问题是我可以有重复,我不希望它们被分组......
      • 然后切换解决方案使用mutate(g = cumsum(!is.na(R))) %&gt;% group_by(g)
      • 更新解决方案
      • 可以将data.table解决方案缩短为:setDT(my_df)[, .(D = paste0(D, collapse = ""), R = first(R)), by = cumsum(!is.na(R))][, !"cumsum"]
      猜你喜欢
      • 1970-01-01
      • 2016-04-20
      • 2015-07-19
      • 2022-01-18
      • 2014-12-26
      • 1970-01-01
      • 2017-04-05
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多