根据大写字母拆分字符串答案

【问题标题】：Split a string based on upper case letters根据大写字母拆分字符串
【发布时间】：2016-01-14 00:05:51
【问题描述】：

如何根据字符串中包含的每个大写字母来拆分字符串。无法从互联网上找到任何帮助。

a<-"MiXeD"
b<-"ServiceEventId"

我想买

a<-c("Mi", "Xe", "D")
b<-c("Service", "Event", "Id")

【问题讨论】：

查看这里的一些选项（具体是第二个答案的 cmets）：stackoverflow.com/questions/7988959/… - 奇怪的是，在 Google 搜索 R Split a string based on upper case letters 时，这是 first 的结果，本质上是你的问题标题。嗯？
见：stackoverflow.com/questions/22528625/…
@thelatemail 可能值得强调（即使我知道你知道）那里的答案都没有达到 OP 的要求。（正如你暗示的那样，我的评论——本博尔克回答下面的第三条——确实如此。）
@thelatemail，您引用的 SO 帖子与我引用的帖子不同。

标签： regex r

【解决方案1】：

这里有一个选项，它使用一个lookbehind 和一个lookahead 断言来查找（然后在其处拆分）紧跟大写字母的字符间空格。了解为什么需要前瞻断言和后瞻断言（即不仅仅是前瞻断言）see this question and its answers。

f <- function(x) {
    strsplit(x, "(?<=.)(?=[[:upper:]])", perl=TRUE)
}

f(a)
# [[1]]
# [1] "Mi" "Xe" "D" 

f(b)
# [[1]]
# [1] "Service" "Event"   "Id"

【讨论】：

只是为了搞笑，这里是regmatches 改编版 - regmatches(d,gregexpr("([[:upper:]]|^)([^[:upper:]]+|$)",d))

【解决方案2】：

使用 stringr 包中的str_extract_all：

library(stringr)
str_extract_all(x, "[A-Z][a-z]*")

或

str_extract_all(x, "[A-Z][a-z]*|[a-z]+")

【讨论】：

我认为如果字符串以小写或其他内容开头 - 例如"thisIsMixed"
str_extract_all(x, "[A-Z][a-z]*|[a-z]+")