【问题标题】:Filter_at selected columns with multiple str_detect patternsFilter_at 具有多个 str_detect 模式的选定列
【发布时间】:2019-07-31 05:28:16
【问题描述】:

我应该做的事情应该很容易,但是,我是新手,我花了太多时间来尝试实现这一目标。使用此脚本,我尝试从包含任何上述模式的数据框中过滤掉所有观察结果。

脚本是:

df1 <- filter_at(df, vars(contains("Pair")), 
                 any_vars(str_detect(., pattern="quinoaquinoa|lupinelupine", negate=TRUE)))

当我运行它时,我没有收到任何错误,但是没有任何变化,并且表达式没有从数据框中取出。据我了解这些功能,我也可以在str_detect 前面放置一个! 而不是negate=TRUE,但是两者都不起作用。

请注意,数据框实际上更大(除了包含“Pair”的列之外,还有其他列,并且要过滤掉的模式总是不同的,并且是从另一个数据框中检索的。

数据框如下:

str(df)

'data.frame':   653 obs. of  6 variables:
 $ Pair_1: Factor w/ 7 levels "grasscloverleycamelina",..: 3 7 7 3 3 3 7 6 6 6 ...
 $ Pair_2: Factor w/ 20 levels "camelinacamelina",..: 10 6 6 8 8 10 6 8 8 10 ...
 $ Pair_3: Factor w/ 20 levels "camelinacamelina",..: 19 20 20 20 19 19 20 20 20 16 ...
 $ Pair_4: Factor w/ 23 levels "camelinacamelina",..: 9 8 8 8 9 9 4 1 1 5 ...
 $ Pair_5: Factor w/ 20 levels "camelinacamelina",..: 9 12 16 16 13 13 12 12 11 11 ...
 $ Pair_6: Factor w/ 20 levels "camelinacamelina",..: 20 13 9 17 20 20 5 7 8 8 ...

dput 数据框:

structure(list(Pair_1 = structure(c(3L, 7L, 7L, 3L, 3L, 3L), .Label = c("grasscloverleycamelina", 
"grasscloverleyquinoa", "lupinecamelina", "lupinegrasscloverley", 
"lupinelupine", "lupinequinoa", "lupinespringcereal"), class = "factor"), 
    Pair_2 = structure(c(10L, 6L, 6L, 8L, 8L, 10L), .Label = c("camelinacamelina", 
    "camelinagrasscloverley", "camelinalupine", "camelinaquinoa", 
    "camelinaspringcereal", "grasscloverleycamelina", "grasscloverleygrasscloverley", 
    "grasscloverleylupine", "grasscloverleyquinoa", "grasscloverleyspringcereal", 
    "quinoacamelina", "quinoagrasscloverley", "quinoalupine", 
    "quinoaquinoa", "quinoaspringcereal", "springcerealcamelina", 
    "springcerealgrasscloverley", "springcereallupine", "springcerealquinoa", 
    "springcerealspringcereal"), class = "factor"), Pair_3 = structure(c(19L, 
    20L, 20L, 20L, 19L, 19L), .Label = c("camelinacamelina", 
    "camelinagrasscloverley", "camelinalupine", "camelinaquinoa", 
    "camelinaspringcereal", "grasscloverleycamelina", "grasscloverleygrasscloverley", 
    "grasscloverleylupine", "grasscloverleyquinoa", "grasscloverleyspringcereal", 
    "quinoacamelina", "quinoagrasscloverley", "quinoalupine", 
    "quinoaquinoa", "quinoaspringcereal", "springcerealcamelina", 
    "springcerealgrasscloverley", "springcereallupine", "springcerealquinoa", 
    "springcerealspringcereal"), class = "factor"), Pair_4 = structure(c(9L, 
    8L, 8L, 8L, 9L, 9L), .Label = c("camelinacamelina", "camelinagrasscloverley", 
    "camelinalupine", "camelinaquinoa", "camelinaspringcereal", 
    "grasscloverleycamelina", "grasscloverleygrasscloverley", 
    "grasscloverleyquinoa", "grasscloverleyspringcereal", "lupinecamelina", 
    "lupinegrasscloverley", "lupinelupine", "lupinequinoa", "lupinespringcereal", 
    "quinoacamelina", "quinoagrasscloverley", "quinoaquinoa", 
    "quinoaspringcereal", "springcerealcamelina", "springcerealgrasscloverley", 
    "springcereallupine", "springcerealquinoa", "springcerealspringcereal"
    ), class = "factor"), Pair_5 = structure(c(9L, 12L, 16L, 
    16L, 13L, 13L), .Label = c("camelinacamelina", "camelinagrasscloverley", 
    "camelinaquinoa", "camelinaspringcereal", "grasscloverleycamelina", 
    "grasscloverleygrasscloverley", "grasscloverleyquinoa", "grasscloverleyspringcereal", 
    "lupinecamelina", "lupinegrasscloverley", "lupinequinoa", 
    "lupinespringcereal", "quinoacamelina", "quinoagrasscloverley", 
    "quinoaquinoa", "quinoaspringcereal", "springcerealcamelina", 
    "springcerealgrasscloverley", "springcerealquinoa", "springcerealspringcereal"
    ), class = "factor"), Pair_6 = structure(c(20L, 13L, 9L, 
    17L, 20L, 20L), .Label = c("camelinacamelina", "camelinagrasscloverley", 
    "camelinaquinoa", "camelinaspringcereal", "grasscloverleycamelina", 
    "grasscloverleygrasscloverley", "grasscloverleyquinoa", "grasscloverleyspringcereal", 
    "lupinecamelina", "lupinegrasscloverley", "lupinequinoa", 
    "lupinespringcereal", "quinoacamelina", "quinoagrasscloverley", 
    "quinoaquinoa", "quinoaspringcereal", "springcerealcamelina", 
    "springcerealgrasscloverley", "springcerealquinoa", "springcerealspringcereal"
    ), class = "factor")), row.names = c(NA, 6L), class = "data.frame")

【问题讨论】:

  • 为了可读性,还请格式化内联代码,这次我帮你了。

标签: r regex filter dplyr


【解决方案1】:

您可以遍历数据框中具有“配对”的列,检查是否存在所需的模式,创建一个逻辑向量矩阵并选择没有出现该模式的行。

cols <- grep("Pair", names(df))
df[rowSums(sapply(df[cols],function(x) grepl("quinoaquinoa|lupinelupine", x)))== 0, ]

【讨论】:

  • 谢谢,它有效!你介意解释一下这个是如何工作的吗?
  • @BellaLin 使用grep 我们首先找出其中包含“Pair”的列。然后我们使用sapply 遍历这些列,并检查其中哪些具有所需的模式。只需运行sapply(df[cols],function(x) grepl("quinoaquinoa|lupinelupine", x)),您将获得一个包含 TRUE/FALSE 值的矩阵,指示其中是否存在模式。现在我们执行rowSums 并仅选择那些在其行中根本没有出现该模式的行。 (== 0)。
【解决方案2】:

您的数据框中没有包含“quinoaquinoa”或“lupinelupine”的字符串。我认为您使用的模式不正确。这有效:filter_at(df, vars(contains("Pair")), any_vars(str_detect(., pattern = "quinoa|lupine")))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-11-29
    • 2023-04-10
    • 1970-01-01
    • 1970-01-01
    • 2014-08-15
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多