R - 每个 ID 和日期的观察计数 [重复]答案

【问题标题】：R - Count observations per ID and dates [duplicate]R - 每个 ID 和日期的观察计数 [重复]
【发布时间】：2018-03-05 11:39:07
【问题描述】：

我有一个包含 2 列的索赔文件：“客户 ID”、“声明日期”。

我想查看（并计算）客户是否在 X 时间段（比如说一年）内发生了不止一次事故。

我的数据如下所示：

Customer_Id     Declaration_date   
001             12/10/2017
001             12/10/2017
002             24/10/2017
003             25/10/2017
004             25/10/2017
001             05/12/2017
006             07/12/2017

这里是：

D <- data.frame(Customer_Id = c(001, 001, 002, 003, 004, 001, 006),
            Declaration_date = as.Date(c("12/10/2017", "12/10/2017", "24/10/2017", "25/10/2017", "25/10/2017", "05/12/2017", "07/12/2017"), format = "%d/%m/%Y"))

在这里，我们可以看到客户“001”在 12/10 有两个索赔，但在 05/12 也有一个索赔。因此，我想要的是第三列，根据日期计算不同索赔的数量，例如客户自 2016 年 1 月 1 日以来的日期。输出应如下所示：

Customer_Id     Declaration_date     Number of claims 
001             12/10/2017           2
001             12/10/2017           2
002             24/10/2017           1
003             25/10/2017           1
004             25/10/2017           1
001             05/12/2017           2
006             07/12/2017           1

请注意，在同一日期拥有多次客户 ID 不应将“索赔数量”相加。在我的示例中，客户 001 有“2”项索赔，因为他在 2010 年 12 月和 12 年 5 月有一个（或多个）索赔。

非常感谢任何帮助。

非常感谢，

【问题讨论】：

标签： r filter dplyr group-by count

【解决方案1】：

我们可以使用base R中的ave通过获取'Declaration_date'的lengthofunique`元素来创建列

with(D, ave(as.numeric(Declaration_date), Customer_Id, FUN = function(x) length(unique(x))))

或者dplyr

library(dplyr)
D %>%
  group_by(Customer_Id) %>%
  mutate(Number_of_claims = n_distinct(Declaration_date))

或使用data.table

library(data.table)
setDT(D)[,  Number_of_claims := uniqueN(Declaration_date), Customer_Id]

【讨论】：

Declaration_Date in dplyr 链需要小写 d 表示日期：)