每组唯一值的累积计数答案

【问题标题】：Cumulative count of unique values per group每组唯一值的累积计数
【发布时间】：2014-01-03 23:29:24
【问题描述】：

我有一个包含姓名和资格状态日期的 df。我想根据时间创建一个人有多少个独特的 elig_end_dates 的指标。这是我的 df：

 names date_of_claim elig_end_date
1    tom    2010-01-01    2010-07-01
2    tom    2010-05-04    2010-07-01
3    tom    2010-06-01    2014-01-01
4    tom    2010-10-10    2014-01-01
5   mary    2010-03-01    2014-06-14
6   mary    2010-05-01    2014-06-14
7   mary    2010-08-01    2014-06-14
8   mary    2010-11-01    2014-06-14
9   mary    2011-01-01    2014-06-14
10  john    2010-03-27    2011-03-01
11  john    2010-07-01    2011-03-01
12  john    2010-11-01    2011-03-01
13  john    2011-02-01    2011-03-01

这是我想要的输出：

 names date_of_claim elig_end_date obs
1    tom    2010-01-01    2010-07-01   1
2    tom    2010-05-04    2010-07-01   1
3    tom    2010-06-01    2014-01-01   2
4    tom    2010-10-10    2014-01-01   2
5   mary    2010-03-01    2014-06-14   1
6   mary    2010-05-01    2014-06-14   1
7   mary    2010-08-01    2014-06-14   1
8   mary    2010-11-01    2014-06-14   1
9   mary    2011-01-01    2014-06-14   1
10  john    2010-03-27    2011-03-01   1
11  john    2010-07-01    2011-03-01   1
12  john    2010-11-01    2011-03-01   1
13  john    2011-02-01    2011-03-01   1

我发现这篇文章很有用 R: Count unique values by category，但答案是作为单独的表格给出的，而不是包含在 df 中。

我也试过这个：

df$ob = ave(df$elig_end_date, df$elig_end_date, FUN=seq_along)

但这会产生一个计数，我真的只是想要一个指标。

提前谢谢你

STEPHEN 代码的产物（这不是正确的代码 - 只是作为学习点发布）

names date_of_claim elig_end_date ob
1    tom    2010-01-01    2010-07-01  2
2    tom    2010-05-04    2010-07-01  2
3    tom    2010-06-01    2014-01-01  2
4    tom    2010-10-10    2014-01-01  2
5   mary    2010-03-01    2014-06-14  5
6   mary    2010-05-01    2014-06-14  5
7   mary    2010-08-01    2014-06-14  5
8   mary    2010-11-01    2014-06-14  5
9   mary    2011-01-01    2014-06-14  5
10  john    2010-03-27    2011-03-01  4
11  john    2010-07-01    2011-03-01  4
12  john    2010-11-01    2011-03-01  4
13  john    2011-02-01    2011-03-01  4

【问题讨论】：

您好-我发布了一个快速答案，但我对您的示例感到困惑，因为唯一值 elig_end_date 的计数看起来不对？？？我误会了吗？
我将在上面发布您的代码输出，以便您查看。再次感谢您的输入！ ;)
那么为什么在您想要的输出示例中，Tom 得到了 1,1,2,2？
@StephenHenderson 当表格按 date_of_claim 排序时，它看起来像是唯一结束日期的运行计数。
我想要一个指示资格日期变化的指标，正如 Mattrition 所说的那样 - 唯一日期的运行计数。再次感谢您的帮助:)

标签： r count unique cumulative-frequency

【解决方案1】：

使用ave的另一种可能性：

df$obs <- with(df, ave(elig_end_date, names,
                       FUN = function(x) cumsum(!duplicated(x))))

#    names date_of_claim elig_end_date obs
# 1    tom    2010-01-01    2010-07-01   1
# 2    tom    2010-05-04    2010-07-01   1
# 3    tom    2010-06-01    2014-01-01   2
# 4    tom    2010-10-10    2014-01-01   2
# 5   mary    2010-03-01    2014-06-14   1
# 6   mary    2010-05-01    2014-06-14   1
# 7   mary    2010-08-01    2014-06-14   1
# 8   mary    2010-11-01    2014-06-14   1
# 9   mary    2011-01-01    2014-06-14   1
# 10  john    2010-03-27    2011-03-01   1
# 11  john    2010-07-01    2011-03-01   1
# 12  john    2010-11-01    2011-03-01   1
# 13  john    2011-02-01    2011-03-01   1

【讨论】：