评估错误：“n_distinct()”至少需要一列答案

【问题标题】：Evaluation Error : Need at least one column for 'n_distinct()'评估错误：“n_distinct()”至少需要一列
【发布时间】：2021-08-03 10:06:00
【问题描述】：

我正在使用 R 编程语言。我有一个包含 2 列的数据框（my_file）：my_date（例如 2000-01-15，因子格式）和“blood_type”（也是因子格式）。我正在尝试使用 dplyr 库按组（按月）生成不同的计数。

我想出了如何进行非明确计数：

library(dplyr)

new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(blood_type, month = format(date, "%Y-%m")) %>%
summarise(count = n())

但这不适用于不同的计数：

new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(blood_type, month = format(date, "%Y-%m")) %>%
summarise(count = n_distinct())

Evaluation Error : Need at least one column for 'n_distinct()'

我试图显式引用该列，但这会产生一个空文件：

new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(blood_type, month = format(date, "%Y-%m")) %>%
summarise(count = n_distinct(my_file$blood_type))

谁能告诉我我做错了什么？

谢谢

【问题讨论】：

标签： r dplyr group-by count distinct

【解决方案1】：

如果您想统计每个月不同的blood_type，请不要将其包含在group_by 中。试试看：

library(dplyr)

new_file <- my_file %>%
  mutate(date = as.Date(my_date)) %>%
  group_by(month = format(date, "%Y-%m")) %>%
  summarise(count = n_distinct(blood_type))

【讨论】：

谢谢，这解决了问题！如果我想将组更改为“周”，应该这样做吗？ %>% group_by(month = format(date, "%W-%y")) %>% ?
如果我想将组更改为“day”，应该这样做吗？ %>% group_by(month = format(date, "%Y-%m-%d")) %>%
当天可以直接使用date in group_by。可以使用%W、%V、%U（检查?strptime）或使用lubridate::weeks以不同的方式计算周。
如果你有时间，你能看看这个问题（有赏金）吗？ stackoverflow.com/questions/67764577/…谢谢

【解决方案2】：

使用data.table

library(data.table)
setDT(my_file)[, .(count = uniqueN(blood_type), 
        .(month = format(as.IDate(my_date), '%Y-%m'))]

【讨论】：