正则表达式 - 使用 (1) 连字符或 (2) 句尾过滤答案

【问题标题】：Regex - filter with (1) hyphen or (2) end of sentence正则表达式 - 使用 (1) 连字符或 (2) 句尾过滤
【发布时间】：2020-08-14 21:37:00
【问题描述】：

我需要 RegEx 过滤方面的支持！

我有一个关键字列表和许多应该检查的行。在此示例中，关键字“-book-”可以是 (1) 位于句子中间或 (2) 结尾，这意味着最后一个连字符不存在。

我需要一个正则表达式，它标识“-book-”和“-book”。我不希望识别类似的关键字，如“-booking-”等。

library(dplyr)
keywords = c( "-album-",  "-book-", "-castle-")                 
search_terms = paste(keywords, collapse ="|")                
number = c(1:5)
sentences = c("the-best-album-in-shop", "this-book-is-fantastic", "that-is-the-best-book", "spacespacespace", "unwanted-sentence-with-booking")   
data = data.frame(number, sentences)

output = data %>% filter(., grepl( search_terms, sentences) )

# Current output:
 number              sentences
1      1 the-best-album-in-shop
2      2 this-book-is-fantastic

# DESIRED output:
  number              sentences
1      1 the-best-album-in-shop
2      2 this-book-is-fantastic
3      3  that-is-the-best-book

【问题讨论】：

删除"-book-"，将"\\bbook-", "-book\\b"添加到search_terms
感谢这个小例子。但在我的完整关键字列表中，它给出了错误。以“W”或“S”开头的关键字会出错，因为 \\W 是非单词字符的命令，而 \\S 也是。

标签： r regex filtering

【解决方案1】：

你也可以这样做：

subset(data, grepl(paste0(sprintf("%s?\\b",keywords),collapse = "|"), sentences))

  number              sentences
1      1 the-best-album-in-shop
2      2 this-book-is-fantastic
3      3  that-is-the-best-book

请注意，这只会检查-book- 在句子中间的(1) 或结尾的(2) 而不是在开头

【讨论】：

【解决方案2】：

-book- 模式将匹配整个单词book，左侧带有连字符和右侧。

要将整个单词 book 与左侧或右侧的连字符匹配，您需要替换 \bbook-|-book\b。

因此，您可以使用

keywords = c( "-album-",  "\\bbook-", "-book\\b", "-castle-" )

【讨论】：

【解决方案3】：

您可以考虑的另一种解决方案

library(stringr)
data %>% 
  filter(str_detect(sentences, regex("-castle-|-album-|-book$|-book-\\w{1,}")))
#   number              sentences
# 1      1 the-best-album-in-shop
# 2      2 this-book-is-fantastic
# 3      3  that-is-the-best-book

【讨论】：