【问题标题】:Filtering based on matching string patters基于匹配字符串模式的过滤
【发布时间】:2019-09-27 17:44:10
【问题描述】:

我有一个如下所示的数据集:

df <- data.frame("id" = c("Alpha", "Beta", "Gamma","Alpha","Beta","Gamma","Lambda","Tau"), 
                 "group" = c("Alpha is good", "Alpha is good", "Alpha is good", "Beta is bad", "Beta is bad","Beta is bad","Beta is bad","Beta is bad"), 
                 "Val" = c(2,2,2,5,5,5,5,5))

我想在组名与 id 名匹配时过滤观察。总之,最终的数据集应该是这样的:

final <- data.frame("id" = c("Alpha", "Beta"), 
                 "group" = c("Alpha is good", "Beta is bad"), 
                 "Val" = c(2,5))

这个想法是该函数应该能够识别“id”中的字符串是否也存在于“group”中。

我希望这很清楚

提前感谢您的帮助

【问题讨论】:

    标签: r filter


    【解决方案1】:

    我们可以使用矢量化的str_detect(根据?str_detect

    在字符串和模式上矢量化。

    library(stringr)
    library(dplyr(
    df %>%
      mutate_if(is.factor, as.character) %>%
      filter(str_detect(group, id))
    

    如果每个组中有重叠的元素

    df %>%
      mutate_if(is.factor, as.character) %>%
      group_by(group1 = group) %>%
      filter(str_detect(group, id))
    

    【讨论】:

      【解决方案2】:

      base R 的一种可能是:

      df[unlist(Map(grepl, df$id, df$group)), ]
      
           id         group Val
      1 Alpha Alpha is good   2
      5  Beta   Beta is bad   5
      

      或者更优雅地使用mapply()(基于@r2evans 的评论):

      df[mapply(grepl, df$id, df$group), ]
      

      样本数据:

      df <- data.frame("id" = c("Alpha", "Beta", "Gamma","Alpha","Beta","Gamma","Lambda","Tau"), 
                       "group" = c("Alpha is good", "Alpha is good", "Alpha is good", "Beta is bad", "Beta is bad","Beta is bad","Beta is bad","Beta is bad"), 
                       "Val" = c(2,2,2,5,5,5,5,5),
                       stringsAsFactors = FALSE)
      

      【讨论】:

      • unlist(Map(...))代替mapply(...)好吗?
      • (如果您查看源代码,Map 无论如何只是一个方便的包装器 :-)
      猜你喜欢
      • 1970-01-01
      • 2019-03-16
      • 2019-09-19
      • 2012-08-13
      • 1970-01-01
      • 1970-01-01
      • 2013-04-18
      • 2013-03-02
      • 1970-01-01
      相关资源
      最近更新 更多