【发布时间】:2015-02-15 20:40:18
【问题描述】:
我正在使用 Redshift (Postgres) 和 Pandas 来完成我的工作。我正在尝试获取用户操作的数量,让我们说购买以使其更容易理解。我有一张表,购买包含以下数据:
user_id, timestamp , price
1, , 2015-02-01, 200
1, , 2015-02-02, 50
1, , 2015-02-10, 75
最后,我想要某个时间戳的购买次数。比如
userid, 28-14_days, 14-7_days, 7
这是我目前所拥有的,我知道我没有日期上限:
SELECT DISTINCT x_days.user_id, SUM(x_days.purchases) AS x_num, SUM(y_days.purchases) AS y_num,
x_days.x_date, y_days.y_date
FROM
(
SELECT purchases.user_id, COUNT(purchases.user_id) as purchases,
DATE(purchases.timestamp) as x_date
FROM purchases
WHERE purchases.timestamp > (current_date - INTERVAL '%(x_days_ago)s day') AND
purchases.max_value > 200
GROUP BY DATE(purchases.timestamp), purchases.user_id
) AS x_days
JOIN
(
SELECT purchases.user_id, COUNT(purchases.user_id) as purchases,
DATE(purchases.timestamp) as y_date
FROM purchases
WHERE purchases.timestamp > (current_date - INTERVAL '%(y_days_ago)s day') AND
purchases.max_value > 200
GROUP BY DATE(purchases.timestamp), purchases.user_id) AS y_days
ON
x_days.user_id = y_days.user_id
GROUP BY
x_days.user_id, x_days.x_date, y_days.y_date
params={'x_days_ago':x_days_ago, 'y_days_ago':y_days_ago}
where these are set in python/pandas
x_days_ago = 14 y_days_ago = 7
但这并没有完全按计划进行:
user_id x_num y_num x_date y_date
0 5451772 1 1 2015-02-10 2015-02-10
1 5026678 1 1 2015-02-09 2015-02-09
2 6337993 2 1 2015-02-14 2015-02-13
3 6204432 1 3 2015-02-10 2015-02-11
4 3417539 1 1 2015-02-11 2015-02-11
即使我没有可以查看的上限日期(因此 x 有效地搜索从 14 天到现在,y 是 7 天到现在,这意味着重叠),在某些情况下 y 更高。
谁能帮我解决这个问题或给我一个更好的方法?
谢谢!
【问题讨论】:
标签: sql postgresql pandas amazon-redshift