【问题标题】:How to return rows in one DataFrame that partially match the rows in another DataFrame (string match)如何返回一个 DataFrame 中与另一个 DataFrame 中的行部分匹配的行(字符串匹配)
【发布时间】:2020-05-21 14:41:02
【问题描述】:

我想返回 list2 中包含 list1 中的字符串的所有行。

list1 <- tibble(name = c("the setosa is pretty", "the versicolor is the best", "the mazda is not a flower"))

list2 <- tibble(name = c("the setosa is pretty and the best flower", "the versicolor is the best and a red flower", "the mazda is a great car"))

例如,代码应该从 list2 中返回“setosa is pretty and the best 花”,因为它包含来自 list1 的短语“the setosa is pretty”。我试过了:

grepl(list1$name, list2$name)

但我收到以下警告: “警告信息: 在 grepl(commonPhrasesNPSLessthan6$value, dfNPSLessthan6$nps_comment) 中: 参数 'pattern' 的长度 > 1,并且只会使用第一个元素”。

我将不胜感激!谢谢!

编辑

list1 <- structure(list(value = c("it would not let me", "to go back and change", 
"i was not able to", "there is no way to", "to pay for a credit"
), n = c(15L, 14L, 12L, 11L, 9L)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

list2 <- structure(list(comment = c("it would not let me go back and change things", 
"There is no way to back up without starting allover.", "Could not link blah blah account. ", 
"i really just want to speak to someone - and, now that I'm at the very end of the process-", 
"i felt that some of the information that was asked to provide wasn't necessary", 
"i was not able to to go back and make changes")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame")

)

【问题讨论】:

  • 你想要这个吗?stringr::str_extract_all(list2$name,list1$name)?为什么不应该退回 versicolor?

标签: r pattern-matching stringr


【解决方案1】:

编辑基于新数据:

list2 %>% 
  filter(stringr::str_detect(comment,paste0(list1$value,collapse = "|")))
# A tibble: 2 x 1
  comment                                      
  <chr>                                        
1 it would not let me go back and change things
2 i was not able to to go back and make changes

原创

stringr 选项:

list2[stringr::str_detect(list2$name,list1$name),]
# A tibble: 2 x 1
  name                                       
  <chr>                                      
1 the setosa is pretty and the best flower   
2 the versicolor is the best and a red flower

base 唯一的解决方案:

list2[lengths(lapply(list1$name,grep,list2$name))>0,]
# A tibble: 2 x 1
  name                                       
  <chr>                                      
1 the setosa is pretty and the best flower   
2 the versicolor is the best and a red flower

【讨论】:

  • 谢谢你这适用于测试示例。不幸的是,当我将它应用于我的数据时,我收到以下警告,它只返回正确行的子集:italic“警告消息:在 stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) : 较长的对象长度不是较短对象长度的倍数”知道为什么会发生这种情况吗?
  • 请使用dput(head(df,n)) 添加您的数据样本。对两个数据集执行此操作以获得更好的重现性。
  • 这里是再现性数据
  • Df1: structure(list(value = c("它不会让我", "回去改变", "我不能", "没有办法", "支付信用"), n = c(15L, 14L, 12L, 11L, 9L)), row.names = c(NA, -5L), class= c("tbl_df", "tbl", " data.frame"))
  • 谢谢!这行得通。我喜欢互联网以及像你这样的人是多么的慷慨。感谢您的帮助
猜你喜欢
  • 2020-06-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2023-01-06
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多