【发布时间】:2020-08-13 16:41:01
【问题描述】:
我有一个包含用户 ID、订阅开始和当前月份、活动日期和活动编号的 df。如果用户有多个活动,则他们可以出现多次。下面是一个简短的玩具示例:
USER_ID SUB_START CURRENT_MONTH ACTIVITY_DATE ACTIVITY_NUMBER
0102 2020-04-01 2020-08-01 2020-02-05 1
0102 2020-04-01 2020-08-01 2020-03-10 2
0102 2020-04-01 2020-08-01 2020-07-01 3
2190 2019-05-10 2020-08-01 2017-01-02 1
2190 2019-05-10 2020-08-01 2017-10-02 2
0121 2020-07-13 2020-08-01 2018-01-04 1
0121 2020-07-13 2020-08-01 2019-02-10 2
0121 2020-07-13 2020-08-01 2020-01-02 3
0121 2020-07-13 2020-08-01 2020-04-10 4
我想要完成的是按月分组,然后显示当月具有有效订阅的唯一 ID 的数量以及在该月的前 13 个月内具有活动日期的唯一 ID 的数量。所以这个玩具数据集的输出看起来像:
MONTH ACTIVE_COUNT ACTIVITY_COUNT
2019-05-01 1 0 *user 2190 active with no activity within past 13 mo
2019-06-01 1 0 *user 2190 active with no activity within past 13 mo
2019-07-01 1 0 *user 2190 active with no activity within past 13 mo
2019-08-01 1 0 *user 2190 active with no activity within past 13 mo
2019-09-01 1 0 *user 2190 active with no activity within past 13 mo
2019-10-01 1 0 *user 2190 active with no activity within past 13 mo
2019-11-01 1 0 *user 2190 active with no activity within past 13 mo
2019-12-01 1 0 *user 2190 active with no activity within past 13 mo
2020-01-01 1 0 *user 2190 active with no activity within past 13 mo
2020-02-01 1 0 *user 2190 active with no activity within past 13 mo
2020-03-01 1 0 *user 2190 active with no activity within past 13 mo
2020-04-01 2 1 *user 2190 and 0102 active and 0102 has a qualifying activity
2020-05-01 2 1 *user 2190 and 0102 active and 0102 has a qualifying activity
2020-06-01 2 1 *user 2190 and 0102 active and 0102 has a qualifying activity
2020-07-01 3 2 *user 2190,0102,0121 all active and 0102 and 0121 have qualifying activities
到目前为止,我已经根据之前的项目整理了以下代码,该项目为我提供了每个用户以及他们的 SUB_START 和 CURRENT_MONTH 之间每个月的一行。问题是它为每个 ACTIVITY_DATE 重复该过程,因此每个 USER_ID 都有多个活动月份组。我希望尝试为每个用户处于活动状态的每个月设置一行,然后添加一列以说明该用户在该月的 13 个月内是否有 ACTIVITY_DATE。
df_monthly <- df %>%
select(USER_ID,SUB_START, CURRENT_MONTH, ACTIVITY_DATE) %>%
mutate(across(where(is.character), ~ floor_date(as.Date(.x) - 1, "months") + 1)) %>%
rowwise() %>%
mutate(MONTH = list(seq(SUB_START,CURRENT_MONTH, by = "+1 month"))) %>%
unnest(MONTH) %>%
mutate(MONTH2 = floor_date(MONTH, unit="month"))
【问题讨论】: