R数据框字符串包含：第1列是否包含第2列？答案

【问题标题】：R data frame string contains: Does column 1 contain column 2?R数据框字符串包含：第1列是否包含第2列？
【发布时间】：2016-04-15 12:54:06
【问题描述】：

我有一个包含两列的数据框：

  Surname                Email
1   house  greghouse@gmail.com
2  wilson johnwatson@gmail.com

我想创建一个逻辑向量来检查Surname 是否包含在Email 中。因此结果应该是：

  Surname                Email CheckEmail
1   house  greghouse@gmail.com       TRUE
2  wilson johnwatson@gmail.com      FALSE

我尝试了grep，但似乎grep 只能在 1 个或多个实例中查找一种模式。 我特别需要在多个实例中寻找多种模式。

> grep(df1$Surname,df1$Email)
[1] 1
Warning message:
In grep(df1$Surname, df1$Email) :
  argument 'pattern' has length > 1 and only the first element will be used

【问题讨论】：

标签： r string dataframe contains

【解决方案1】：

试试library("stringi") 和：

df1$CheckEmail <- stri_detect_fixed(df1$Email, df1$Surname)

【讨论】：

为什么需要 dplyr？
应该使用额外的包是有原因的。在这里，使用stringi（或其包装器stringr 和str_detect 函数）是有意义的，而使用dplyr 没有任何意义，因为同样的简单操作（向data.frame 添加列） ) 显然可以在baseR.
doesn't work 是什么意思？该行有效，它相当于with(dat, stri_detect_fixed(Email,Surname))。 with 函数是不必要的，它只是糖。最基本的版本是dat$CheckEmail <- stri_detect_fixed(dat$Email,dat$Surname)。
保留您的答案，因为stringi 位很有用。只需删除dplyr 部分并进行相应编辑。
@nicola 完全正确。我使用 with 是为了提高可读性（习惯于对数据框使用过长的名称）。

【解决方案2】：

这是使用Vectorize 和grepl 的基本R 选项：

df1$CheckEmail <- Vectorize(grepl)(df1$Surname, df1$Email)

【讨论】：

我会记住Vectorize - 看起来很有用。比使用vapply快吗？
@Bazz 可读性很强，但就性能而言，它对你没有任何好处。

【解决方案3】：

这是使用mapply 和grepl 的基本R 方法：

transform(df, CheckEmail = mapply(grepl, Surname, Email))
#  Surname                Email CheckEmail
#1   house  greghouse@gmail.com       TRUE
#2  wilson johnwatson@gmail.com      FALSE

【讨论】：