【发布时间】:2015-02-08 06:04:46
【问题描述】:
我有一个字符向量x 和一个data.frame y,如下所示。
x <- c("Pumpkin Helmet", "Warm Puppy", "Frisbee Sailing",
"Warm Puppy Frisbee Sailing", "Good Sport", "Masked Marvel",
"Spring Dance", "Spring Warm Dance Puppy", "Sock it to Me",
"Maskedspring Dancemarvel", "warm Puppy", "masked marvel",
"WARM PUPPY", " Spring Dance", "Warm Puppy Spring Dance",
"Warmspring Dancepuppy")
x
[1] "Pumpkin Helmet" "Warm Puppy" "Frisbee Sailing"
[4] "Warm Puppy Frisbee Sailing" "Good Sport" "Masked Marvel"
[7] "Spring Dance" "Spring Warm Dance Puppy" "Sock it to Me"
[10] "Maskedspring Dancemarvel" "warm Puppy" "masked marvel"
[13] "WARM PUPPY" " Spring Dance" "Warm Puppy Spring Dance"
[16] "Warmspring Dancepuppy"
a <- c("Masked", "Warm", "spring")
b <- c("Marvel", "Puppy", "dance")
y <- data.frame(a,b)
y
a b
1 Masked Marvel
2 Warm Puppy
3 spring dance
我正在尝试使用regex 创建一个函数来合并 y 中的一行中的单词,无论它们存在于x 中。
在尝试使用apply 和x 和y 之前,我已经尝试了以下方法来获得所需的regex。
gsub("Spring(\\s+)Dance.*", "SpringDance", x)
gsub("spring(\\s+)Dance.*", "SpringDance", x)
gsub("Warm(\\s+)Puppy.*", "WarmPuppy", x)
我仍在努力使用R 中的regex 以获得所需的输出out。在这种情况下,理想的regex 是什么?它应该只匹配整个单词,应该忽略大小写并删除中间的多个空格。
out <- c("Pumpkin Helmet", "WarmPuppy", "Frisbee Sailing",
"WarmPuppy Frisbee Sailing", "Good Sport", "MaskedMarvel",
"SpringDance", "Spring Warm Dance Puppy", "Sock it to Me",
"Maskedspring Dancemarvel", "warmPuppy", "maskedmarvel",
"WARMPUPPY", " SpringDance", "WarmPuppy SpringDance",
"Warmspring Dancepuppy")
[1] "Pumpkin Helmet" "WarmPuppy" "Frisbee Sailing"
[4] "WarmPuppy Frisbee Sailing" "Good Sport" "MaskedMarvel"
[7] "SpringDance" "Spring Warm Dance Puppy" "Sock it to Me"
[10] "Maskedspring Dancemarvel" "warmPuppy" "maskedmarvel"
[13] "WARMPUPPY" " SpringDance" "WarmPuppy SpringDance"
[16] "Warmspring Dancepuppy"
【问题讨论】:
标签: regex r string string-matching