【发布时间】:2019-03-01 20:01:26
【问题描述】:
我有一个问题;我想在“pexl07”中列出的每个字符模式上过滤数据框“data01”中的每一列 Pair_1 直到 Pair_4。
数据框 data01 如下所示:
Pair_1 Pair_2 Pair_3 Pair_4
453 lupinespringcereal grasscloverleyquinoa springcerealspringcereal camelinacamelina
1073 lupinespringcereal grasscloverleycamelina springcerealspringcereal quinoaquinoa
1330 lupinespringcereal grasscloverleycamelina quinoaspringcereal lupinequinoa
1373 lupinespringcereal grasscloverleycamelina quinoaquinoa lupinespringcereal
1698 lupinecamelina grasscloverleyspringcereal quinoaquinoa springcerealspringcereal
1910 lupinespringcereal springcerealcamelina grasscloverleyspringcereal lupinequinoa
1947 lupinespringcereal springcerealcamelina grasscloverleyquinoa lupinespringcereal
1979 lupinespringcereal springcerealquinoa grasscloverleyspringcereal lupinecamelina
2141 lupinequinoa springcerealspringcereal grasscloverleycamelina lupinespringcereal
2745 lupinecamelina springcerealspringcereal grasscloverleyquinoa springcerealspringcereal
Pexl07 看起来像这样(为了举例):
V1
1 quinoaquinoa
2 springcerealspringcereal
我尝试了许多不同的东西,使用 for()、filter()、subset()、grepl.sub() 和 grepl(),但我没有设法让它工作,可能是因为我不明白用循环索引。也欢迎使用不带循环的选项。
这件作品适用于单列和单一模式:
data02 <- filter(data01, !grepl(paste(pexl07[1 , 1]), paste(data01[ ,1 ]))
但是,对于 pexl07 下的所有表达式和 data01 的所有列,如何使其自动工作?
我尝试了一些变体,但它没有返回我想要的:
for (j in ncol(data01)) {
for (i in 1:nrow(pexl07)) {
data02 <- filter(data01,
!grepl(paste(pexl07[j, ]), paste(data01[ ,i])))
}
}
明确地说,我希望它以这样的方式结束:
Pair_1 Pair_2 Pair_3 Pair_4
1330 lupinespringcereal grasscloverleycamelina quinoaspringcereal lupinequinoa
1910 lupinespringcereal springcerealcamelina grasscloverleyspringcereal lupinequinoa
1947 lupinespringcereal springcerealcamelina grasscloverleyquinoa lupinespringcereal
1979 lupinespringcereal springcerealquinoa grasscloverleyspringcereal lupinecamelina
带输入:
structure(list(Pair_1 = structure(c(6L, 6L, 6L, 6L), .Label = c("grasscloverleycamelina",
"grasscloverleyquinoa", "lupinecamelina", "lupinegrasscloverley",
"lupinequinoa", "lupinespringcereal"), class = "factor"), Pair_2 = structure(c(3L,
9L, 9L, 11L), .Label = c("camelinacamelina", "camelinagrasscloverley",
"grasscloverleycamelina", "grasscloverleyquinoa", "grasscloverleyspringcereal",
"quinoagrasscloverley", "quinoaquinoa", "quinoaspringcereal",
"springcerealcamelina", "springcerealgrasscloverley", "springcerealquinoa",
"springcerealspringcereal"), class = "factor"), Pair_3 = structure(c(11L,
7L, 6L, 7L), .Label = c("camelinacamelina", "camelinagrasscloverley",
"camelinaquinoa", "camelinaspringcereal", "grasscloverleycamelina",
"grasscloverleyquinoa", "grasscloverleyspringcereal", "quinoacamelina",
"quinoagrasscloverley", "quinoaquinoa", "quinoaspringcereal",
"springcerealcamelina", "springcerealquinoa", "springcerealspringcereal"
), class = "factor"), Pair_4 = structure(c(6L, 6L, 7L, 5L), .Label = c("camelinacamelina",
"camelinagrasscloverley", "grasscloverleycamelina", "grasscloverleyspringcereal",
"lupinecamelina", "lupinequinoa", "lupinespringcereal", "quinoagrasscloverley",
"quinoaquinoa", "quinoaspringcereal", "springcerealcamelina",
"springcerealquinoa", "springcerealspringcereal"), class = "factor")), row.names = c(1330L,
1910L, 1947L, 1979L), class = "data.frame")
输入 pexl07:
structure(list(V1 = structure(1:2, .Label = c("quinoaquinoa",
"springcerealspringcereal"), class = "factor")), row.names = 1:2, class = "data.frame")
输入数据01:
structure(list(Pair_1 = structure(c(6L, 6L, 6L, 6L, 3L, 6L), .Label = c("grasscloverleycamelina",
"grasscloverleyquinoa", "lupinecamelina", "lupinegrasscloverley",
"lupinequinoa", "lupinespringcereal"), class = "factor"), Pair_2 = structure(c(4L,
3L, 3L, 3L, 5L, 9L), .Label = c("camelinacamelina", "camelinagrasscloverley",
"grasscloverleycamelina", "grasscloverleyquinoa", "grasscloverleyspringcereal",
"quinoagrasscloverley", "quinoaquinoa", "quinoaspringcereal",
"springcerealcamelina", "springcerealgrasscloverley", "springcerealquinoa",
"springcerealspringcereal"), class = "factor"), Pair_3 = structure(c(14L,
14L, 11L, 10L, 10L, 7L), .Label = c("camelinacamelina", "camelinagrasscloverley",
"camelinaquinoa", "camelinaspringcereal", "grasscloverleycamelina",
"grasscloverleyquinoa", "grasscloverleyspringcereal", "quinoacamelina",
"quinoagrasscloverley", "quinoaquinoa", "quinoaspringcereal",
"springcerealcamelina", "springcerealquinoa", "springcerealspringcereal"
), class = "factor"), Pair_4 = structure(c(1L, 9L, 6L, 7L, 13L,
6L), .Label = c("camelinacamelina", "camelinagrasscloverley",
"grasscloverleycamelina", "grasscloverleyspringcereal", "lupinecamelina",
"lupinequinoa", "lupinespringcereal", "quinoagrasscloverley",
"quinoaquinoa", "quinoaspringcereal", "springcerealcamelina",
"springcerealquinoa", "springcerealspringcereal"), class = "factor")), row.names = c(453L,
1073L, 1330L, 1373L, 1698L, 1910L), class = "data.frame")
【问题讨论】: