【问题标题】:Calculate time from the last purchase among the same product purchase计算同一产品购买中最后一次购买的时间
【发布时间】:2019-09-14 11:34:48
【问题描述】:
问题陈述:
您将获得以下数据:
- customer_id 列表
- 产品列表
- 购买时间
- 迄今为止购买相同产品的总数
要查找:
- Time_from_last_purchase 在同一产品中
预期输出(最后一列):
customer_id product purchase_time total_to_date time_from_last_purchase
1 A 2014-11-24 1 0
1 A 2018-02-21 2 1185
1 E 2014-01-08 1 0
2 J 2016-04-18 1 0
3 F 2017-06-12 1 0
3 G 2017-06-23 1 0
4 F 2017-09-27 1 0
4 F 2018-01-08 2 103
4 F 2018-02-08 3 31
4 F 2018-02-09 4 1
4 F 2018-04-10 5 60
我的方法:
- 如果我手动操作,任何客户第一次购买特定产品,那么 time_from_last_purchase 为 0。
- 任何客户从第二次购买产品,那么 time_from_last_purchase 将等于当前购买的 time_purchase - 上一次购买的 time_purchase
我对 R 很陌生,因此非常感谢任何帮助。谢谢!
【问题讨论】:
标签:
r
dataframe
dplyr
lag
【解决方案1】:
dplyr,你可以试试:
df %>%
group_by(customer_id, product) %>%
mutate(purchase_time = as.Date(purchase_time, format = "%Y-%m-%d"),
res = purchase_time - lag(purchase_time, default = first(purchase_time)))
customer_id product purchase_time total_to_date res
<int> <chr> <date> <int> <time>
1 1 A 2014-11-24 1 0 days
2 1 A 2018-02-21 2 1185 days
3 1 E 2014-01-08 1 0 days
4 2 J 2016-04-18 1 0 days
5 3 F 2017-06-12 1 0 days
6 3 G 2017-06-23 1 0 days
7 4 F 2017-09-27 1 0 days
8 4 F 2018-01-08 2 103 days
9 4 F 2018-02-08 3 31 days
10 4 F 2018-02-09 4 1 days
11 4 F 2018-04-10 5 60 days
或者如果您需要将结果作为数值变量:
df %>%
group_by(customer_id, product) %>%
mutate(purchase_time = as.Date(purchase_time, format = "%Y-%m-%d"),
res = as.numeric(purchase_time - lag(purchase_time, default = first(purchase_time))))
customer_id product purchase_time total_to_date res
<int> <chr> <date> <int> <dbl>
1 1 A 2014-11-24 1 0
2 1 A 2018-02-21 2 1185
3 1 E 2014-01-08 1 0
4 2 J 2016-04-18 1 0
5 3 F 2017-06-12 1 0
6 3 G 2017-06-23 1 0
7 4 F 2017-09-27 1 0
8 4 F 2018-01-08 2 103
9 4 F 2018-02-08 3 31
10 4 F 2018-02-09 4 1
11 4 F 2018-04-10 5 60
【解决方案2】:
使用diff的另一种方法
library(dplyr)
df %>%
mutate(purchase_time = as.Date(purchase_time)) %>%
group_by(customer_id, product) %>%
mutate(diff = c(0, diff(purchase_time)))
# customer_id product purchase_time total_to_date time_from_last_purchase diff
# <int> <fct> <date> <int> <int> <dbl>
# 1 1 A 2014-11-24 1 0 0
# 2 1 A 2018-02-21 2 1185 1185
# 3 1 E 2014-01-08 1 0 0
# 4 2 J 2016-04-18 1 0 0
# 5 3 F 2017-06-12 1 0 0
# 6 3 G 2017-06-23 1 0 0
# 7 4 F 2017-09-27 1 0 0
# 8 4 F 2018-01-08 2 103 103
# 9 4 F 2018-02-08 3 31 31
#10 4 F 2018-02-09 4 1 1
#11 4 F 2018-04-10 5 60 60
类似的使用 base R ave 我们可以做
df$diff <- with(df, ave(as.numeric(as.Date(purchase_time)), customer_id, product,
FUN = function(x) c(0, diff(x))))
如果您的purchase_time 已经属于date 类,您可以在这两种方法中跳过as.Date 部分。