【发布时间】:2022-01-07 22:11:15
【问题描述】:
我有带有注释符号的话语:
utt <- c("↑hey girls↑ can I <join yo:u>", "((v: grunts))", "!damn shit! got it",
"I mean /yeah we saw each other at a party:/↓ the other day"
)
我需要将utt 拆分成单独的单词除非这些单词被某些分隔符包围,包括此类[(/≈↑£<>°!]。我对utts 使用 双负前瞻 做得相当好,其中只有 one 分隔符之间出现这样的字符串;但在分隔符之间存在 多个 这样的字符串时,我无法正确拆分:
library(tidyr)
library(dplyr)
data.frame(utt2) %>%
separate_rows(utt, sep = "(?!.*[(/≈↑£<>°!].*)\\s(?!.*[)/≈↑£<>°!])")
# A tibble: 9 × 1
utt2
<chr>
1 ↑hey girls↑ can I <join yo:u>
2 ((v: grunts))
3 !damn shit!
4 got
5 it
6 I mean /yeah we saw each other at a party:/↓
7 the
8 other
9 day
预期结果将是:
1 ↑hey girls↑
2 can
3 I
4 <join yo:u>
5 ((v: grunts))
6 !damn shit!
7 got
8 it
9 I
10 mean
11 /yeah we saw each other at a party:/↓
12 the
13 other
14 day
【问题讨论】: