【问题标题】:Postgres: how to group by item in range efficiently?Postgres:如何有效地按范围分组?
【发布时间】:2018-02-26 06:48:30
【问题描述】:

假设我有所有间隔项都处于活动状态的架构:

item_active
- item_id    -- id, foreign_key to item.id
- date_from  -- timestamp
- date_to    -- timestamp

我想按每天从date1date2 的活跃项目数进行分组。我可以通过加入日期子查询来做到这一点:

with sq as (
    select generate_series(date1, date2, '1 day'::interval)::date dt
)
select sq.dt, count(distinct item_id)
from sq
join item_active 
     on item_active.date_from::date <= sq.dt
        and item_active.date_to::date >= sq.dt
group by sq.dt;

这很好用,但执行时间线性取决于 (date2 - date1) 中的天数,O(N)。所以我想分组的天数越多,我的执行速度就越慢。

GroupAggregate  (cost=213338137.82..216968937.32 rows=200 width=8) (actual time=7220.689..8938.530 rows=5 loops=1)
  Group Key: sq.dt
  CTE sq
    ->  Result  (cost=0.00..5.01 rows=1000 width=0) (actual time=0.011..0.029 rows=5 loops=1)
  ->  Sort  (cost=213338132.81..214548398.65 rows=484106333 width=8) (actual time=6745.165..7054.655 rows=4623322 loops=1)
    Sort Key: sq.dt
    Sort Method: external sort  Disk: 81352kB
    ->  Nested Loop  (cost=0.00..123648051.46 rows=484106333 width=8) (actual time=0.035..5994.225 rows=4623322 loops=1)
          Join Filter: (((item_active.date_from)::date <= sq.dt) AND ((item_active.date_to)::date >= sq.dt))
          Rows Removed by Join Filter: 17161463
          ->  CTE Scan on sq  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.014..0.039 rows=5 loops=1)
          ->  Materialize  (cost=0.00..122921.36 rows=4356957 width=20) (actual time=0.005..415.443 rows=4356957 loops=5)
                ->  Seq Scan on item_active  (cost=0.00..75606.57 rows=4356957 width=20) (actual time=0.011..382.122 rows=4356957 loops=1)
Planning time: 0.165 ms
Execution time: 8963.670 ms

也许有更有效的方法来获得相同的结果?

【问题讨论】:

  • edit您的问题并添加使用explain (analyze, buffers)生成的执行计划。 Formatted textno screen shots
  • @a_horse_with_no_name 添加解释分析
  • 如果你SET enable_seqscan = off,查询执行时间会提高吗?
  • @LaurenzAlbe no
  • 如果您增加work_mem 并摆脱磁盘排序,您可以稍微提高性能。你有date_from, date_to 的索引吗?

标签: sql postgresql performance group-by range


【解决方案1】:

在加入前尝试缩小item_active:

with sq as (
    select generate_series(date1, date2, '1 day'::interval)::date dt
)
select sq.dt, count(distinct item_id)
from sq
join (select * from item_active 
        where item_active.date_from::date <= (select max (sq.dt) from sq)
        and item_active.date_to::date >= (select min (sq.dt) from sq))
     on item_active.date_from::date <= sq.dt
        and item_active.date_to::date >= sq.dt
group by sq.dt;

第二次尝试禁用嵌套循环:

set local enable_nestloop to false;

应该用散列或合并连接代替——在某些情况下它可能会更快。

如果这没有帮助,您应该考虑将这个查询具体化到一个视图中

create materialized view

【讨论】:

    猜你喜欢
    • 2020-04-29
    • 2020-09-08
    • 2016-02-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多