【发布时间】:2015-03-01 04:20:45
【问题描述】:
我正在尝试按日期查看用户活动。第一步是使用交叉联接和 where 子句构建自创建用户帐户以来的每一天的表。我的第一次尝试是这样的:
SELECT
u.user_id as user_id,
date(u.created) as signup_date,
cal.date as date,
from rsdw.user u
cross join (select date(dt) as date from [rsdw.calendar] where date(dt) < CURRENT_DATE() ) cal
where
date(u.created) <= cal.date
(日历表只是自 2006 年以来所有日期的列表(3288 行)。用户表有大约 1m 行。)
这个查询需要很长时间......太长了,以至于我在 1000 秒左右就放弃了它。我试着稍微调整一下查询。如果我在交叉连接中添加一个“每个”:
SELECT
u.user_id as user_id,
date(u.created) as signup_date,
cal.date as date,
from rsdw.user u
cross join each (select date(dt) as date from [rsdw.calendar] where date(dt) < CURRENT_DATE() ) cal
where
date(u.created) <= cal.date
我得到一个错误:
Error: Cannot CROSS JOIN two tables with EACH qualifiers.
最后,如果我保留“每个”但交换表,则只需 90 秒即可完成!
SELECT
u.user_id as user_id,
date(u.created) as signup_date,
cal.date as date,
from (select date(dt) as date from [rsdw.calendar] where date(dt) < CURRENT_DATE() ) cal
cross join each rsdw.user u
where
date(u.created) <= cal.date
谁能解释为什么第三次迭代要快得多,为什么第二次会出错?
【问题讨论】:
标签: google-bigquery cross-join