【发布时间】:2020-12-16 22:46:08
【问题描述】:
大图:我希望我的用户定义函数像循环一样遍历参数列表(或向量)。 (在这种情况下,参数是一个字符串)
get_avg2 <- function(v_name) {
avg <- "_Average"
data_1 <- PFF_College_Defense_data %>%
dplyr::group_by(Name) %>%
dplyr::summarise("{{ v_name }}_{avg}" := mean({{ v_name }}, na.rm = TRUE))
PFF_NCAA_Average_grades <- merge(PFF_NCAA_Average_grades, data_1, by = "Name")
return(PFF_NCAA_Average_grades)
}
v_names <- list("hits", "tackles", "forced_fumbles")
for (i in v_names) {
get_avg2(i)
}
#didn't work
PFF_NCAA_Average_grades <- purrr::map_df(v_names, get_avg2)
#didnt' work
我正在尝试从数据框中按组获取平均值并将其存储为另一个数据框。我编写了一个 UDF 来接受一个参数作为来自原始数据库的变量名,然后 UDF 运行计算并将其合并到我预先格式化以适合 UDF 的结果的已创建数据帧。我想将一个列表传递给我的函数,并让它像循环一样遍历该列表。虽然我只是在概念上似乎无法掌握这个概念或 purrr::map 的使用,但我认为这可以解决问题。
我知道我能做到:
PFF_NCAA_Average_grades <- get_avg2(hits)
PFF_NCAA_Average_grades <- get_avg2(tackles)
PFF_NCAA_Average_grades <- get_avg2(forced_fumbles)
但这看起来又丑又慢。有人可以帮我从概念上理解最好的方法吗?
提前致谢!!!
*** 已使用 REPREX 更新 ******
library(tidyverse)
data_sample <- data.frame(
Name = c("Dalton Campbell", "Dalton Campbell", "Dalton Campbell", "Andre Walker", "Andre Walker", "Andre Walker"),
Defense_Grade = c(88, 86, 92, 94, 97, 95),
Tackle_Grade = c(66, 69, 72, 74, 76, 78),
Coverage_Grade = c(44, 43, 44, 76, 73, 78)
)
#Here I set up the dataframe which the function will bind to
data_sample_averages <- data_sample %>%
group_by(Name) %>%
dplyr::summarise(Defense_Grade_Average = mean(Defense_Grade))
#> `summarise()` ungrouping output (override with `.groups` argument)
#Function which computes average of variable (the only argument) and merges it back to data_sample_averages
get_avg2 <- function(v_name) {
avg <- "_Average"
data_1 <- data_sample %>%
dplyr::group_by(Name) %>%
dplyr::summarise("{{ v_name }}_{avg}" := mean({{ v_name }}, na.rm = TRUE))
data_sample_averages <- merge(data_sample_averages, data_1, by = "Name")
return(data_sample_averages)
}
#This works - it computers the average of Tackle_Grade and binds it to data_sample_averages
data_sample_averages <- get_avg2(Tackle_Grade)
#> `summarise()` ungrouping output (override with `.groups` argument)
#shows you the averages
print(data_sample_averages)
#> Name Defense_Grade_Average Tackle_Grade__Average
#> 1 Andre Walker 95.33333 76
#> 2 Dalton Campbell 88.66667 69
#Neither of these work - this is where I'm stuck
variable_list <- list("Defense_Grade", "Tackle_Grade", "Coverage Grade")
data_sample_averages <- lapply(variable_list, get_avg2)
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
data_sample_averages <- purrr::map(variable_list, get_avg2)
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
这感觉就像一个非常简单的操作 - 从一个数据帧按组计算平均值并将其绑定到另一个数据帧 - 这并不是我真正苦苦挣扎的部分。我想要的是让我的函数自动迭代一系列参数。我希望能够快速构建一个变量列表(或向量 - 我没有设置使用列表)并将其作为参数传递给函数,以便它使用我提供的变量构建一个数据框。但我愿意接受这样的想法,即我在概念上有些错误,我应该使用循环、呼噜声、映射等,或者改变我的函数的编写方式?
【问题讨论】:
-
你试过
unlist吗? -
你是什么意思?我应该在哪里尝试?
-
你的函数似乎是在 tidyverse 中做基本函数
ave所做的相当复杂的方式。 -
我从未使用过该功能,但它看起来像是设计用于处理因子,而我的分组类别是字符?如果我只是将我的角色转换为因子并运行它,它会起作用吗?
-
@Spence_p 是的。它应该直接与字符一起使用