【发布时间】:2017-08-08 19:19:15
【问题描述】:
我有一个数据框:
x <- data.frame(a = letters[1:7], b = letters[2:8],
c = c("bla bla [ text1 ]", "bla bla [text2]", "how how [text3 ]",
"wow wow [ text4a ] [ text4b ]", "ba ba [ text5a ][ text5b]",
"my text A", "my text B"), stringsAsFactors = FALSE)
x
我想根据其中两个方括号 [...] 之间的内容来拆分列 c。如果 c 列仅包含一组方括号,我希望字符串转到下一列。如果 c 列包含由[ 和] 包围的两组字符串,我只希望最后一个[ ] 之间的字符串进入新列。
这是我的做法。看起来很复杂,我正在使用循环。是否有可能以更简约的方式做到这一点?
library(stringr)
# Counting number of square brackets "[" in column c:
sqrbrack_count <- str_count(x$c, pattern = '\\[')
# Creating a new column:
x$newcolumn <- NA
for(i in 1:nrow(x)){ # looping through rows of x
if(sqrbrack_count[i] == 0) next # do nothing of 0 square brackets
minilist <- str_split_fixed(x[i, "c"], pattern = '\\[', n = Inf) # split string
if(sqrbrack_count[i] == 1) { # if there is only one square bracket "["
x[i, "c"] <- minilist[1]
x[i, "newcolumn"] <- minilist[2]
} else { # if there are >1 square bracket "["
x[i, "c"] <- paste(minilist[1:2], collapse = "+")
x[i, "newcolumn"] <- minilist[3]
}
}
# Replacing renmaning square brackets we don't need anymore:
x$c <- str_replace(x$c, pattern = " \\]", replacement = "")
x$c <- str_replace(x$c, pattern = "\\]", replacement = "")
x$newcolumn <- str_replace(x$newcolumn, pattern = " \\]", replacement = "")
x$newcolumn <- str_replace(x$newcolumn, pattern = "\\]", replacement = "")
x
【问题讨论】:
标签: r regex stringr square-bracket