【发布时间】:2021-05-16 09:27:58
【问题描述】:
我正在尝试创建一个 R 函数来将平均值归入数据框中的特定列。
impute_means <- function(df, group_by, column){
vals_to_impute <- df %>%
group_by_at(group_by) %>%
summarise(x = mean(get(column), na.rm = TRUE))
df %>%
filter(is.na(get(column))) %>%
select(group_by, column) %>%
left_join(vals_to_impute, by=group_by)
}
impute_means(df = weather_data, group_by = c("year","month","code","type"), column = "temperature")
但是,现在我想检查“温度”列中的 NA 值并将其替换为 x 列中的值。
我试图通过在末尾添加 mutate 语句来做到这一点,但它似乎不起作用
impute_means <- function(df, group_by, column){
vals_to_impute <- df %>%
group_by_at(group_by) %>%
summarise(x = mean(get(column), na.rm = TRUE))
df %>%
filter(is.na(get(column))) %>%
select(group_by, column) %>%
left_join(vals_to_impute, by=group_by) %>%
mutate(column = case_when(is.na(get(column))~x,
TRUE~get(column)))
}
要重现的最少数据:
天气数据
structure(list(year = structure(c(8L, 8L, 1L, 1L, 2L, 2L, 3L,
3L, 5L, 6L), .Label = c("2000", "2001", "2002", "2003", "2004",
"2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012",
"2013", "2014", "2015", "2016", "2017", "2018", "2019"), class = "factor"),
month = structure(c(12L, 12L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12"), class = "factor"), code = structure(c(1L,
2L, 6L, 1L, 6L, 2L, 2L, 2L, 6L, 2L), .Label = c("1", "2",
"3", "4", "5", "6"), class = "factor"), type = structure(c(2L,
2L, 6L, 2L, 6L, 2L, 2L, 3L, 6L, 3L), .Label = c("1", "2",
"3", "4", "5", "6"), class = "factor"), temperature = c(NA,
NA, 20.8, 19.5, 1.4, 3.1, 27.3, 25.4, 20.2, 26.6)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
【问题讨论】:
-
我不太确定您要做什么,但我认为您可以通过逐步进行而不是尝试在一行中完成所有操作会更轻松。只需从
stuff_to_calculate_mean <- df[,columns]之类的行开始,然后从那里继续 -
@RonakShah 添加了