如何在数据框的其他列中的一列中搜索字符串答案

【问题标题】：How to search for a string in one column in other columns of a data frame如何在数据框的其他列中的一列中搜索字符串
【发布时间】：2015-08-27 05:22:19
【问题描述】：

我有一个表格，叫它df，有3列，第一列是产品的标题，第二列是产品的描述，第三列是一个单词的字符串。我需要做的是在整个表上运行一个操作，创建 2 个新列（称它们为“exists_in_title”和“exists_in_description”），它们的值为 1 或 0，表示第 3 列是否存在于第 1 列或第 2 列中。我需要它只是一个 1:1 操作，例如，调用第 1 行“A”，我需要检查单元格 A3 是否存在于 A1 中，并使用该数据创建列 exists_in_title，然后检查 A3 是否存在于 A2 中，并使用该数据创建列 exists_in_description。然后移动到 B 行并进行相同的操作。我有数千行数据，因此一次以 1 的方式执行这些操作是不现实的，为每一行编写单独的函数，肯定需要一个函数或方法一次性遍历表中的每一行。

我玩过 grepl、pmatch、str_count，但似乎没有一个能真正满足我的需求。我认为 grepl 可能是最接近我需要的，这是我编写的 2 行代码的示例，它们在逻辑上按照我的意愿执行，但似乎没有用：

df$exists_in_title <- grepl(df$A3, df$A1)

df$exists_in_description <- grepl(df$A3, df$A2)

但是，当我运行它们时，我收到以下消息，这让我相信它不能正常工作：“argument 'pattern' has length > 1 and only the first element will be used”

任何有关如何执行此操作的帮助将不胜感激。谢谢！

【问题讨论】：

一个可重复的例子可能会有所帮助：stackoverflow.com/a/28481250/1191259 就我个人而言，我发现这里的文字墙有点难以挖掘。
举个例子有助于澄清您的问题，但您可能正在寻找类似 @987654323@ 的内容。

标签： r string dataframe string-matching grepl

【解决方案1】：

grepl 将与mapply 一起使用：

示例数据框：

title <- c('eggs and bacon','sausage biscuit','pancakes')
description <- c('scrambled eggs and thickcut bacon','homemade biscuit with breakfast pattie', 'stack of sourdough pancakes')
keyword <- c('bacon','sausage','sourdough')
df <- data.frame(title, description, keyword, stringsAsFactors=FALSE)

使用grepl搜索匹配：

df$exists_in_title <- mapply(grepl, pattern=df$keyword, x=df$title)
df$exists_in_description <- mapply(grepl, pattern=df$keyword, x=df$description)

结果：

            title                            description   keyword exists_in_title exists_in_description
1  eggs and bacon      scrambled eggs and thickcut bacon     bacon            TRUE                  TRUE
2 sausage biscuit homemade biscuit with breakfast pattie   sausage            TRUE                 FALSE
3        pancakes            stack of sourdough pancakes sourdough           FALSE                  TRUE

更新我

您也可以使用dplyr 和stringr 执行此操作：

library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(exists_in_title = grepl(keyword, title),
         exists_in_description = grepl(keyword, description))

library(stringr)
df %>% 
  rowwise() %>% 
  mutate(exists_in_title = str_detect(title, keyword),
         exists_in_description = str_detect(description, keyword))

更新二

Map也是一个选项，或者使用更多来自tidyverse的另一个选项可能是purrr和stringr：

library(tidyverse)
df %>%
  mutate(exists_in_title = unlist(Map(function(x, y) grepl(x, y), keyword, title))) %>% 
  mutate(exists_in_description = map2_lgl(description, keyword,  str_detect))

【讨论】：