使用 str_detect 进行特殊符号模式搜索答案

【问题标题】：Special symbols pattern search with str_detect使用 str_detect 进行特殊符号模式搜索
【发布时间】：2020-05-24 13:05:21
【问题描述】：

假设我有以下df：

library(dplyr)
library(stringr)

input <- data.frame(
Id = c(1:6),
text = c("(714.4) (714) (714*)", "(714.33)", "(189) (1938.23)", "(714.93+) (714*)", "(719)", "(718.4)"))

我想获得以下输出：

Output <- data.frame(
Id = c(1:6),
text = c("(714.4) (714) (714*)", "(714.33)", "(189) (1938.23)",
 "(714.93+) (714*)", "(719) (299)", "(718.4)"),
first_match = c(1,0,0,0,1,0),
second_match = c(1,1,0,1,1,0))

这是，对于第一列，如果出现 (714)|(719)|(718)，我想要一个。对于第二列，如果出现 (714.33)|(714*)|(719)，我想要一个

如果我想评估模式是否在字符串中，我使用 stringr 包中的 str_detect 函数。但是，在这种情况下，使用 [. + *] 我没有得到预期的输出。

我试过下面的代码，显然失败了：

attempt_1 <- input %>%
  mutate(first_match = ifelse(str_detect(text, "(714)|(719)|(718)"), 1, 0), 
         second_match = ifelse(str_detect(text, "(714\\.33)|(714\\*)|(719)"), 1, 0))

attempt_2 <- input %>%
 mutate(first_match = ifelse(str_detect(text, fixed("(714)|(719)")), 1, 0), 
        second_match = ifelse(str_detect(text, "(714\\.33)|(714\\*)"), 1, 0))

我尝试转义特殊符号并尝试与固定参数完全匹配（我想它失败了，因为 | 没有被解释为 OR）

有什么想法吗？

【问题讨论】：

标签： r regex text dplyr stringr

【解决方案1】：

我们可以逃脱(

library(dplyr)
library(stringr)
input %>%
    mutate(first_match = +(str_detect(text, "\\(714\\)|\\(719\\)")),
        second_match = +(str_detect(text, "\\(714\\.33\\)|\\(714\\*\\)|\\(719\\)")))
#   Id                 text first_match second_match
#1  1 (714.4) (714) (714*)           1            1
#2  2             (714.33)           0            1
#3  3      (189) (1938.23)           0            0
#4  4     (714.93+) (714*)           0            1
#5  5                (719)           1            1
#6  6              (718.4)           0            0

与 OP 的预期输出比较

Output
#  Id                 text first_match second_match
#1  1 (714.4) (714) (714*)           1            1
#2  2             (714.33)           0            1
#3  3      (189) (1938.23)           0            0
#4  4     (714.93+) (714*)           0            1
#5  5          (719) (299)           1            1
#6  6              (718.4)           0            0

在 OP 的代码中，第一个不起作用，因为 ( 是一个元字符，而在第二次尝试中，| 被认为是固定的

【讨论】：

你是对的人。我是盲人，没有意识到括号也是一个特殊字符。不过，我有一个问题，你为什么要在str_detect函数的开头加一个“+”号？
@torakxkz 只是一种将逻辑强制转换为二进制的技巧。你也可以使用as.integer。但是ifelse 这样做有点牵强，因为 TRUE/FALSE 代表 1/0