【问题标题】:If/else condition in dplyr 0.7 functiondplyr 0.7 函数中的 if/else 条件
【发布时间】:2018-12-09 02:42:02
【问题描述】:

我想在 dplyr 函数中创建一个简单的 if/else 条件。我查看了一些有用的帖子(例如,How to parametrize function calls in dplyr 0.7?),但仍然遇到问题。

下面是一个玩具示例,当我调用函数 没有 分组变量时,它可以工作。然后该函数因分组变量而失败。

# example dataset
test <- tibble(
  A = c(1:5,1:5),
  B = c(1,2,1,2,3,3,3,3,3,3),
  C = c(1,1,1,1,2,3,4,5,4,3)
)

# begin function, set default for group var to NULL.
prop_tab <- function(df, column, group = NULL) {

  col_name <- enquo(column)
  group_name <- enquo(group)

  # if group_by var is NOT null, then...
  if(!is.null(group)) {
      temp <- df %>%
        select(!!col_name, !!group_name) %>% 
        group_by(!!group_name) %>% 
        summarise(Percentages = 100 * length(!!col_name) / nrow(df))

  } else {
  # if group_by var is null, then...
      temp <- df %>%
        select(!!col_name) %>% 
        group_by(col_name = !!col_name) %>% 
        summarise(Percentages = 100 * length(!!col_name) / nrow(df)) 

  }

  temp
}

test %>% prop_tab(column = C)  # works

test %>% prop_tab(column = A, group = B)  # fails
# Error in prop_tab(., column = A, group = B) : object 'B' not found

【问题讨论】:

    标签: r function if-statement dplyr


    【解决方案1】:

    这里的问题是,当您提供不带引号的参数时,is.null 不知道如何处理它。因此,此代码尝试检查对象 B 是否为 null 和错误,因为 B 在该范围内不存在。相反,您可以使用missing() 来检查是否向函数提供了参数,就像这样。可能有一种更清洁的方法,但至少可以使用,正如您在底部看到的那样。

    library(tidyverse)
    test <- tibble(
      A = c(1:5,1:5),
      B = c(1,2,1,2,3,3,3,3,3,3),
      C = c(1,1,1,1,2,3,4,5,4,3)
    )
    
    # begin function, set default for group var to NULL.
    prop_tab <- function(df, column, group) {
    
      col_name <- enquo(column)
      group_name <- enquo(group)
    
      # if group_by var is not supplied, then:
      if(!missing(group)) {
        temp <- df %>%
          select(!!col_name, !!group_name) %>%
        group_by(!!group_name) %>%
        summarise(Percentages = 100 * length(!!col_name) / nrow(df))
    
      } else {
        # if group_by var is null, then...
        temp <- df %>%
          select(!!col_name) %>% 
          group_by(col_name = !!col_name) %>% 
          summarise(Percentages = 100 * length(!!col_name) / nrow(df)) 
    
      }
    
      temp
    }
    
    test %>% prop_tab(column = C)  # works
    #> # A tibble: 5 x 2
    #>   col_name Percentages
    #>      <dbl>       <dbl>
    #> 1        1          40
    #> 2        2          10
    #> 3        3          20
    #> 4        4          20
    #> 5        5          10
    
    test %>% prop_tab(column = A, group = B)
    #> # A tibble: 3 x 2
    #>       B Percentages
    #>   <dbl>       <dbl>
    #> 1     1          20
    #> 2     2          20
    #> 3     3          60
    

    reprex package (v0.2.0) 于 2018 年 6 月 29 日创建。

    【讨论】:

    • 打败你 ;)
    • 我认为说“非 tidyverse 函数不知道如何处理它们”并不准确。 NSE 在tidyverse 之前就在那里
    • 是的,这很公平,已更改以反映这一点
    【解决方案2】:

    您可以使用missing 而不是is.null,因此您的参数不会被评估(这就是导致错误的原因):

    prop_tab <- function(df, column, group = NULL) {
    
      col_name <- enquo(column)
      group_name <- enquo(group)
    
      # if group_by var is NOT null, then...
      if(!missing(group)) {
        temp <- df %>%
          select(!!col_name, !!group_name) %>% 
          group_by(!!group_name) %>% 
          summarise(Percentages = 100 * length(!!col_name) / nrow(df))
    
      } else {
        # if group_by var is null, then...
        temp <- df %>%
          select(!!col_name) %>% 
          group_by(col_name = !!col_name) %>% 
          summarise(Percentages = 100 * length(!!col_name) / nrow(df)) 
    
      }
    
      temp
    }
    
    test %>% prop_tab(column = C) 
    # example dataset
    # # A tibble: 5 x 2
    #   col_name Percentages
    #      <dbl>       <dbl>
    # 1        1          40
    # 2        2          10
    # 3        3          20
    # 4        4          20
    # 5        5          10
    
    test %>% prop_tab(column = A, group = B)
    # # A tibble: 3 x 2
    #       B Percentages
    #   <dbl>       <dbl>
    # 1     1          20
    # 2     2          20
    # 3     3          60
    

    您也可以使用length(substitute(group)) 而不是!missing(group),它会更加健壮,因为它不会在有人用NULL 明确填写组参数的不太可能的情况下失败(前一个选项会崩溃本例)。

    【讨论】:

      【解决方案3】:

      一种选择是检查“group_name”而不是“group”

      prop_tab <- function(df, column, group = NULL) {
      
        col_name <- enquo(column)
        group_name <- enquo(group)
      
        # if group_by var is NOT null, then...
        if(as.character(group_name)[2] != "NULL") {
            temp <- df %>%
              select(!!col_name, !!group_name) %>% 
              group_by(!!group_name) %>% 
              summarise(Percentages = 100 * length(!!col_name) / nrow(df))
      
        } else {
        # if group_by var is null, then...
            temp <- df %>%
              select(!!col_name) %>% 
              group_by(col_name = !!col_name) %>% 
              summarise(Percentages = 100 * length(!!col_name) / nrow(df)) 
      
        }
      
        temp
      }
      

      -检查

      prop_tab(test, column = C, group = B)
      # A tibble: 3 x 2
      #<     B Percentages
      # <dbl>       <dbl>
      #1     1          20
      #2     2          20
      #3     3          60  
      
      
      
      prop_tab(test, column = C)
      # A tibble: 5 x 2
      #  col_name Percentages
      #     <dbl>       <dbl>
      #1        1          40
      #2        2          10
      #3        3          20
      #4        4          20
      #5        5          10
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2019-02-01
        • 2018-01-20
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多