【问题标题】:Filtering using dplyr with column names and conditions as strings使用 dplyr 将列名和条件作为字符串进行过滤
【发布时间】:2019-10-07 17:45:29
【问题描述】:

我尝试编写一个简单的函数来过滤 data.frame。列名和过滤条件都存储为字符串:

vars <- c("manufacturer", "engine")
cond <- c("EMBRAER", "Turbo-fan")

输出应该与以下得到的输出相同:

library(dplyr)
library(nycflights13)

nycflights13::planes %>%
  filter(
    .data[[vars[[1]]]] == cond[[1]],
    .data[[vars[[2]]]] == cond[[2]]
  )

使用 dplyr+purrr 的方法是什么?实际上,这两个字符串的长度要长得多。

【问题讨论】:

  • 根据您更大的工作流程,也许您可​​以改为创建一个条件数据框并使用它与您的数据进行半连接,以便您只保留与行匹配的数据行您的条件数据框

标签: r dplyr purrr


【解决方案1】:

考虑这一点的另一种方法是,您有一个条件数据集,要用于过滤主要数据。创建一个包含条件及其各自变量名称的小数据框,然后将其重塑为一个数据框,其中这些变量名称是列名。然后使用semi_join 在条件数据框中仅保留具有匹配的变量和条件组合的数据行。

vars <- c("manufacturer", "engine")
cond <- c("EMBRAER", "Turbo-fan")

library(dplyr)
library(nycflights13)

cond_df <- data.frame(vars, cond) %>%
  tidyr::spread(key = vars, value = cond)

nycflights13::planes %>%
  semi_join(cond_df, by = vars)
#> # A tibble: 298 x 9
#>    tailnum  year type        manufacturer model  engines seats speed engine
#>    <chr>   <int> <chr>       <chr>        <chr>    <int> <int> <int> <chr> 
#>  1 N10156   2004 Fixed wing… EMBRAER      EMB-1…       2    55    NA Turbo…
#>  2 N10575   2002 Fixed wing… EMBRAER      EMB-1…       2    55    NA Turbo…
#>  3 N11106   2002 Fixed wing… EMBRAER      EMB-1…       2    55    NA Turbo…
#>  4 N11107   2002 Fixed wing… EMBRAER      EMB-1…       2    55    NA Turbo…
#>  5 N11109   2002 Fixed wing… EMBRAER      EMB-1…       2    55    NA Turbo…
#>  6 N11113   2002 Fixed wing… EMBRAER      EMB-1…       2    55    NA Turbo…
#>  7 N11119   2002 Fixed wing… EMBRAER      EMB-1…       2    55    NA Turbo…
#>  8 N11121   2003 Fixed wing… EMBRAER      EMB-1…       2    55    NA Turbo…
#>  9 N11127   2003 Fixed wing… EMBRAER      EMB-1…       2    55    NA Turbo…
#> 10 N11137   2003 Fixed wing… EMBRAER      EMB-1…       2    55    NA Turbo…
#> # … with 288 more rows

【讨论】:

    【解决方案2】:

    1) sym - 我们可以转换为symbols 和evaluate (!!)。 [[ 主要用于提取list 元素。由于 OP 将 'vars' 和 'cond' 显示为 vectors [ 足以提取每个元素

    nycflights13::planes %>%
       filter(
        !!rlang::sym(vars[1]) == cond[1],
         !!rlang::sym(vars[2]) == cond[2]
      )
    

    2) parse_expr- 一个选项是使用pastestr_cstringr 创建一个表达式,然后解析该表达式

    expr1 <- str_c(vars, str_c('"', cond, '"'), sep="==", collapse=" & ")
    nycflights13::planes %>%
        filter(!! rlang::parse_expr(expr1))
    # A tibble: 298 x 9
    #   tailnum  year type                    manufacturer model     engines seats speed engine   
    #   <chr>   <int> <chr>                   <chr>        <chr>       <int> <int> <int> <chr>    
    # 1 N10156   2004 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 2 N10575   2002 Fixed wing multi engine EMBRAER      EMB-145LR       2    55    NA Turbo-fan
    # 3 N11106   2002 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 4 N11107   2002 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 5 N11109   2002 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 6 N11113   2002 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 7 N11119   2002 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 8 N11121   2003 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 9 N11127   2003 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    #10 N11137   2003 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # … with 288 more rows
    

    3) map2/reduce - 如果我们有多个列,那么我们可以使用filter_at,但这里的“条件”不同。所以,一种选择是map2

    library(purrr)
    library(dplyr)
    map2(vars, cond, ~ nycflights13::planes %>%
                           transmute(ind = !! rlang::sym(.x) == .y) %>%
                           pull(ind)) %>%
         reduce(`&`) %>%
         filter(nycflights13::planes, .)
    # A tibble: 298 x 9
    #   tailnum  year type                    manufacturer model     engines seats speed engine   
    #   <chr>   <int> <chr>                   <chr>        <chr>       <int> <int> <int> <chr>    
    # 1 N10156   2004 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 2 N10575   2002 Fixed wing multi engine EMBRAER      EMB-145LR       2    55    NA Turbo-fan
    # 3 N11106   2002 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 4 N11107   2002 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 5 N11109   2002 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 6 N11113   2002 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 7 N11119   2002 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 8 N11121   2003 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # 9 N11127   2003 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    #10 N11137   2003 Fixed wing multi engine EMBRAER      EMB-145XR       2    55    NA Turbo-fan
    # … with 288 more rows
    

    【讨论】:

      猜你喜欢
      • 2018-01-25
      • 1970-01-01
      • 1970-01-01
      • 2020-06-19
      • 2021-01-27
      • 1970-01-01
      • 2018-03-04
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多