column1 %like% column2 上的 data.table 过滤器 [重复]答案

【问题标题】：data.table filter on column1 %like% column2 [duplicate]column1 %like% column2 上的 data.table 过滤器 [重复]
【发布时间】：2025-12-21 16:55:16
【问题描述】：

我目前正在尝试过滤表，但遇到了grep 矢量化的问题。

我有一个data.table，其中有两列我想用于过滤。

dt1 <- data.table(col1 = c("ab", "cd", "ef", "xy"),
                  col2 = c("ab123", "de987", "ef345", "ab123"))

#    col1  col2
# 1:   ab ab123
# 2:   cd de987
# 3:   ef ef345
# 4:   xy ab123

我想根据 col1 何时为 %like% col2 过滤记录。

我的问题是当我尝试时：

dt1[col1 %like% col2]
Empty data.table (0 rows) of 2 cols: col1,col2
Warning message:
  In grepl(pattern, vector) :
  argument 'pattern' has length > 1 and only the first element will be used

我没有得到想要的结果，我收到警告，告诉我我的模式 (col1) 是 >1。

我希望得到的输出是：

# DESIRED OUTPUT
#    col1  col2
# 1:   ab ab123
# 2:   ef ef345

我怀疑它需要某种方法来将 col1 的输入限制为该行的单个值。我已经用.I 尝试了几件事，但我仍在尝试用data.table 找出一些不太常见的情况。

【问题讨论】：

您可以使用dt1[ dt1[ , .(grepl(col1, col2)), by = col1 ]$V1 ] 之类的方式强制它执行逐行操作，但是，我相信还有更优雅的解决方案。

标签： r data.table grepl

【解决方案1】：

对于您的具体情况，您可以使用矢量化的startsWith：

dt1[startsWith(col2, col1)]

但是，在您需要正则表达式的一般情况下，您需要遍历每一行：

dt1[dt1[, grepl(.BY[[1L]], col2), by = "col1"][[2L]]]

【讨论】：