【问题标题】:Join and group_by tidy eval issue加入和 group_by 整洁的评估问题
【发布时间】:2021-12-01 09:34:48
【问题描述】:

我整理了以下功能。它一直工作到最后一部分(在代码中的注释中注明),它必须将对象连接在一起。我不知道如何让它工作。我相信我的主要问题与将 colName 参数转换为 joiner 函数的“by =”参数的字符串有关。关于 group_by 函数,我不确定我放在大括号中的内容是否有效。如果有人可以提供帮助,那就太好了!

   emp_turnover_fun <- function(data, colName, year = "2015") {
  
  # Convert colName to symbol or check if symbol
  colName <- ensym(colName)
  
  # Terminations by year and variable in df
  term_test <- data %>%
    filter(year(DateofTermination) == year) %>%
    count(!!(colName)) %>%
    clean_names()
  
  # Start employees by var and year
  fun_year_job <- paste(year, "-01-01", sep = "")
  start_test <- data %>%
    select(DateofHire, DateofTermination, !!(colName)) %>%
    filter(
      DateofHire <= fun_year_job,
      DateofTermination > fun_year_job | is.na(DateofTermination)
    ) %>%
    count(!!(colName))
  
  # End employees by year and var
  year_pos <- year %>% as.character()
  year_num_plus_pos <- as.character(as.numeric(year_pos) + 1)
  fun_year2_pos <- paste(year_num_plus_pos, "-01-01", sep = "")
  
  end_test <- data %>%
    select(DateofHire, DateofTermination, !!(colName)) %>%
    filter(
      DateofHire <= fun_year2_pos,
      DateofTermination > fun_year2_pos | is.na(DateofTermination)
    ) %>%
    count(!!(colName))
  #### PROBLEM BEGINS HERE
  join_turnover_year <- full_join(start_test, end_test, by = str(colName)) %>%
    full_join(y = term_test, by = str(colName)) %>%
    setNames(c(str(colName), "Start_Headcount", "End_Headcount", "Terminations")) %>%
    group_by({{colName}}) %>%
    summarise(Turnover = ((Terminations) / (Start_Headcount + End_Headcount)) * 100)
  
  return(join_turnover_year)
}

【问题讨论】:

    标签: r dplyr tidyverse tidyeval


    【解决方案1】:

    问题是使用str 获取对象的结构。假设 colName 作为字符串传递,我们不需要任何包装。在函数内部,它被转换为带有ensym 的符号。因此,要么在将符号转换为不同对象之前获取输入(假设它是一个字符串),要么使用来自rlangas_string

     emp_turnover_fun <- function(data, colName, year = "2015") {
      
      # Convert colName to symbol or check if symbol
      colName <- ensym(colName)
      colName_str <- rlang::as_string(colName) ## converted to string
    
      
      # Terminations by year and variable in df
      term_test <- data %>%
        filter(year(DateofTermination) == year) %>%
        count(!!(colName)) %>%
        clean_names()
      
      # Start employees by var and year
      fun_year_job <- paste(year, "-01-01", sep = "")
      start_test <- data %>%
        select(DateofHire, DateofTermination, !!(colName)) %>%
        filter(
          DateofHire <= fun_year_job,
          DateofTermination > fun_year_job | is.na(DateofTermination)
        ) %>%
        count(!!(colName))
      
      # End employees by year and var
      year_pos <- year %>% as.character()
      year_num_plus_pos <- as.character(as.numeric(year_pos) + 1)
      fun_year2_pos <- paste(year_num_plus_pos, "-01-01", sep = "")
      
      end_test <- data %>%
        select(DateofHire, DateofTermination, !!(colName)) %>%
        filter(
          DateofHire <= fun_year2_pos,
          DateofTermination > fun_year2_pos | is.na(DateofTermination)
        ) %>%
        count(!!(colName))
      
      join_turnover_year <- full_join(start_test, end_test, 
                 by = colName_str) %>% # use the string
        full_join(y = term_test, by = colName_str) %>% # use the string
        setNames(c(colName_str, "Start_Headcount", "End_Headcount", 
                 "Terminations")) %>% # here as well
        group_by({{colName}}) %>%
        summarise(Turnover = ((Terminations) / (Start_Headcount + End_Headcount)) * 100)
      
      return(join_turnover_year)
    }
    

    使用as_string 比直接将输入作为字符串更安全,即ensym 可以使用未引用或引用的值,因此如果我们传递未引用的值,那么抓取输入不起作用,即它可能需要deparse(substitute(colName))。相反,首先转换为符号,然后使用as_string 转换回字符串

    【讨论】:

    • 这很好用!我还发现我一开始的 clean_names 参数有问题。谢谢
    猜你喜欢
    • 2018-06-10
    • 2017-11-16
    • 2020-04-28
    • 2020-11-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-01-08
    • 2019-04-12
    相关资源
    最近更新 更多