【发布时间】:2018-02-26 06:48:30
【问题描述】:
假设我有所有间隔项都处于活动状态的架构:
item_active
- item_id -- id, foreign_key to item.id
- date_from -- timestamp
- date_to -- timestamp
我想按每天从date1 到date2 的活跃项目数进行分组。我可以通过加入日期子查询来做到这一点:
with sq as (
select generate_series(date1, date2, '1 day'::interval)::date dt
)
select sq.dt, count(distinct item_id)
from sq
join item_active
on item_active.date_from::date <= sq.dt
and item_active.date_to::date >= sq.dt
group by sq.dt;
这很好用,但执行时间线性取决于 (date2 - date1) 中的天数,O(N)。所以我想分组的天数越多,我的执行速度就越慢。
GroupAggregate (cost=213338137.82..216968937.32 rows=200 width=8) (actual time=7220.689..8938.530 rows=5 loops=1)
Group Key: sq.dt
CTE sq
-> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.011..0.029 rows=5 loops=1)
-> Sort (cost=213338132.81..214548398.65 rows=484106333 width=8) (actual time=6745.165..7054.655 rows=4623322 loops=1)
Sort Key: sq.dt
Sort Method: external sort Disk: 81352kB
-> Nested Loop (cost=0.00..123648051.46 rows=484106333 width=8) (actual time=0.035..5994.225 rows=4623322 loops=1)
Join Filter: (((item_active.date_from)::date <= sq.dt) AND ((item_active.date_to)::date >= sq.dt))
Rows Removed by Join Filter: 17161463
-> CTE Scan on sq (cost=0.00..20.00 rows=1000 width=4) (actual time=0.014..0.039 rows=5 loops=1)
-> Materialize (cost=0.00..122921.36 rows=4356957 width=20) (actual time=0.005..415.443 rows=4356957 loops=5)
-> Seq Scan on item_active (cost=0.00..75606.57 rows=4356957 width=20) (actual time=0.011..382.122 rows=4356957 loops=1)
Planning time: 0.165 ms
Execution time: 8963.670 ms
也许有更有效的方法来获得相同的结果?
【问题讨论】:
-
请edit您的问题并添加使用
explain (analyze, buffers)生成的执行计划。 Formatted text 请no screen shots -
@a_horse_with_no_name 添加解释分析
-
如果你
SET enable_seqscan = off,查询执行时间会提高吗? -
@LaurenzAlbe no
-
如果您增加
work_mem并摆脱磁盘排序,您可以稍微提高性能。你有date_from, date_to的索引吗?
标签: sql postgresql performance group-by range