【发布时间】:2021-02-16 16:55:21
【问题描述】:
我想选择已知短语之间的文本,但排除第一个单词使用 R 和正则表达式。格式如下
"known phrase + unknown_word + target phrase + known_word + bla bla"
例如:
Tesco Plc sells coffee beans today in stores over the uk
Known phrase = "Tesco Plc"
Unknown word = "sells"
Target phrase = "coffee beans"
known word = "today"
bla bla (unrelated text) = "in stores over the uk"
初步尝试
text = "Tesco Plc sells coffee beans today in stores over the uk"
known_phrase = "Tesco Plc"
known_word = "today"
# code
str_extract(text, paste0("(?<=",known_phrase,").*(?=", known_word ,")"))]
这会同时选择unknown_word 和target phrase。但我只想要target phrase/
【问题讨论】:
-
stringr::str_match(x, "Tesco\\s+Plc\\s+\\w+\\s+(.*?)\\s+today")[,2]?见regex101.com/r/oztc5i/1。当您的上下文不是静态的时,str_extract就没有那么灵活了。 -
与
str_remove结合使用效果更好,非常感谢!!