加入和 group_by 整洁的评估问题答案

【问题标题】：Join and group_by tidy eval issue加入和 group_by 整洁的评估问题
【发布时间】：2021-12-01 09:34:48
【问题描述】：

我整理了以下功能。它一直工作到最后一部分（在代码中的注释中注明），它必须将对象连接在一起。我不知道如何让它工作。我相信我的主要问题与将 colName 参数转换为 joiner 函数的“by =”参数的字符串有关。关于 group_by 函数，我不确定我放在大括号中的内容是否有效。如果有人可以提供帮助，那就太好了！

   emp_turnover_fun <- function(data, colName, year = "2015") {
  
  # Convert colName to symbol or check if symbol
  colName <- ensym(colName)
  
  # Terminations by year and variable in df
  term_test <- data %>%
    filter(year(DateofTermination) == year) %>%
    count(!!(colName)) %>%
    clean_names()
  
  # Start employees by var and year
  fun_year_job <- paste(year, "-01-01", sep = "")
  start_test <- data %>%
    select(DateofHire, DateofTermination, !!(colName)) %>%
    filter(
      DateofHire <= fun_year_job,
      DateofTermination > fun_year_job | is.na(DateofTermination)
    ) %>%
    count(!!(colName))
  
  # End employees by year and var
  year_pos <- year %>% as.character()
  year_num_plus_pos <- as.character(as.numeric(year_pos) + 1)
  fun_year2_pos <- paste(year_num_plus_pos, "-01-01", sep = "")
  
  end_test <- data %>%
    select(DateofHire, DateofTermination, !!(colName)) %>%
    filter(
      DateofHire <= fun_year2_pos,
      DateofTermination > fun_year2_pos | is.na(DateofTermination)
    ) %>%
    count(!!(colName))
  #### PROBLEM BEGINS HERE
  join_turnover_year <- full_join(start_test, end_test, by = str(colName)) %>%
    full_join(y = term_test, by = str(colName)) %>%
    setNames(c(str(colName), "Start_Headcount", "End_Headcount", "Terminations")) %>%
    group_by({{colName}}) %>%
    summarise(Turnover = ((Terminations) / (Start_Headcount + End_Headcount)) * 100)
  
  return(join_turnover_year)
}

【问题讨论】：

标签： r dplyr tidyverse tidyeval

【解决方案1】：

问题是使用str 获取对象的结构。假设 colName 作为字符串传递，我们不需要任何包装。在函数内部，它被转换为带有ensym 的符号。因此，要么在将符号转换为不同对象之前获取输入（假设它是一个字符串），要么使用来自rlang 的as_string

 emp_turnover_fun <- function(data, colName, year = "2015") {
  
  # Convert colName to symbol or check if symbol
  colName <- ensym(colName)
  colName_str <- rlang::as_string(colName) ## converted to string

  
  # Terminations by year and variable in df
  term_test <- data %>%
    filter(year(DateofTermination) == year) %>%
    count(!!(colName)) %>%
    clean_names()
  
  # Start employees by var and year
  fun_year_job <- paste(year, "-01-01", sep = "")
  start_test <- data %>%
    select(DateofHire, DateofTermination, !!(colName)) %>%
    filter(
      DateofHire <= fun_year_job,
      DateofTermination > fun_year_job | is.na(DateofTermination)
    ) %>%
    count(!!(colName))
  
  # End employees by year and var
  year_pos <- year %>% as.character()
  year_num_plus_pos <- as.character(as.numeric(year_pos) + 1)
  fun_year2_pos <- paste(year_num_plus_pos, "-01-01", sep = "")
  
  end_test <- data %>%
    select(DateofHire, DateofTermination, !!(colName)) %>%
    filter(
      DateofHire <= fun_year2_pos,
      DateofTermination > fun_year2_pos | is.na(DateofTermination)
    ) %>%
    count(!!(colName))
  
  join_turnover_year <- full_join(start_test, end_test, 
             by = colName_str) %>% # use the string
    full_join(y = term_test, by = colName_str) %>% # use the string
    setNames(c(colName_str, "Start_Headcount", "End_Headcount", 
             "Terminations")) %>% # here as well
    group_by({{colName}}) %>%
    summarise(Turnover = ((Terminations) / (Start_Headcount + End_Headcount)) * 100)
  
  return(join_turnover_year)
}

使用as_string 比直接将输入作为字符串更安全，即ensym 可以使用未引用或引用的值，因此如果我们传递未引用的值，那么抓取输入不起作用，即它可能需要deparse(substitute(colName))。相反，首先转换为符号，然后使用as_string 转换回字符串

【讨论】：

这很好用！我还发现我一开始的 clean_names 参数有问题。谢谢