【问题标题】:last and first value from group组中的最后一个值和第一个值
【发布时间】:2016-09-07 11:03:58
【问题描述】:

我有一个任务:从每组(按时间)数据中获取第一、最后、最大值、最小值。我的解决方案有效,但速度极慢,因为表中的行数约为 5000 万。

如何提高此查询的性能:

SELECT
   date_trunc('minute', t_ordered."timestamp"),
   MIN (t_ordered.price),
   MAX (t_ordered.price),
   FIRST (t_ordered.price),
   LAST (t_ordered.price)
FROM(
    SELECT t.price, t."timestamp"
    FROM trade t
    WHERE  t."timestamp" >= '2016-01-01' AND t."timestamp" < '2016-09-01'
    ORDER BY t."timestamp" ASC
) t_ordered
GROUP BY 1
ORDER BY 1

FIRST 和 LAST 是聚合函数from postgresql wiki

已编入索引的时间戳列。 解释(分析,详细):

GroupAggregate  (cost=13112830.84..33300949.59 rows=351556 width=14) (actual time=229538.092..468212.450 rows=351138 loops=1)
   Output: (date_trunc('minute'::text, t_ordered."timestamp")), min(t_ordered.price), max(t_ordered.price), first(t_ordered.price), last(t_ordered.price)
      Group Key: (date_trunc('minute'::text, t_ordered."timestamp"))
      ->  Sort  (cost=13112830.84..13211770.66 rows=39575930 width=14) (actual time=229515.281..242472.677 rows=38721704 loops=1)
         Output: (date_trunc('minute'::text, t_ordered."timestamp")), t_ordered.price
         Sort Key: (date_trunc('minute'::text, t_ordered."timestamp"))
         Sort Method: external sort  Disk: 932656kB
         ->  Subquery Scan on t_ordered  (cost=6848734.55..7442373.50 rows=39575930 width=14) (actual time=102166.368..155540.492 rows=38721704 loops=1)
             Output: date_trunc('minute'::text, t_ordered."timestamp"), t_ordered.price
             ->  Sort  (cost=6848734.55..6947674.38 rows=39575930 width=14) (actual time=102165.836..130971.804 rows=38721704 loops=1)
                Output: t.price, t."timestamp"
                Sort Key: t."timestamp"
                Sort Method: external merge  Disk: 993480kB
                ->  Seq Scan on public.trade t  (cost=0.00..1178277.21 rows=39575930 width=14) (actual time=0.055..25726.038 rows=38721704 loops=1)
                      Output: t.price, t."timestamp"
                      Filter: ((t."timestamp" >= '2016-01-01 00:00:00'::timestamp without time zone) AND (t."timestamp" < '2016-09-01 00:00:00'::timestamp without time zone))
                      Rows Removed by Filter: 9666450
Planning time: 1.663 ms
Execution time: 468949.753 ms

也许可以通过窗口函数来完成?我已经尝试过,但我没有足够的知识来使用它们

【问题讨论】:

  • 内部查询的性能和行数是多少?
  • 如果去掉子查询中的first()last()order by,性能如何?
  • 子查询查询的行数约为600万
  • 我需要在子查询中排序,没有它整个查询松散的感觉
  • 只是想知道,时间戳列是否被索引?

标签: sql postgresql aggregate postgresql-performance


【解决方案1】:

创建一个类型和足够的聚合有望更好地工作:

create type tp as (timestamp timestamp, price int);

create or replace function min_tp (tp, tp)
returns tp as $$
    select least($1, $2);
$$ language sql immutable;

create aggregate min (tp) (
    sfunc = min_tp,
    stype = tp
);

minmax(未列出)聚合函数会将查询减少到单个循环:

select
    date_trunc('minute', timestamp) as minute,
    min (price) as price_min,
    max (price) as price_max,
    (min ((timestamp, price)::tp)).price as first,
    (max ((timestamp, price)::tp)).price as last
from t
where timestamp >= '2016-01-01' and timestamp < '2016-09-01'
group by 1
order by 1

解释(分析,详细):

GroupAggregate  (cost=6954022.61..27159050.82 rows=287533 width=14) (actual time=129286.817..510119.582 rows=351138 loops=1)
   Output: (date_trunc('minute'::text, "timestamp")), min(price), max(price), (min(ROW("timestamp", price)::tp)).price, (max(ROW("timestamp", price)::tp)).price
   Group Key: (date_trunc('minute'::text, trade."timestamp"))
   ->  Sort  (cost=6954022.61..7053049.25 rows=39610655 width=14) (actual time=129232.165..156277.718 rows=38721704 loops=1)
      Output: (date_trunc('minute'::text, "timestamp")), price, "timestamp"
      Sort Key: (date_trunc('minute'::text, trade."timestamp"))
      Sort Method: external merge  Disk: 1296392kB
      ->  Seq Scan on public.trade  (cost=0.00..1278337.71 rows=39610655 width=14) (actual time=0.035..45335.947 rows=38721704 loops=1)
          Output: date_trunc('minute'::text, "timestamp"), price, "timestamp"
          Filter: ((trade."timestamp" >= '2016-01-01 00:00:00'::timestamp without time zone) AND (trade."timestamp" < '2016-09-01 00:00:00'::timestamp without time zone))
          Rows Removed by Filter: 9708857
Planning time: 0.286 ms
Execution time: 510648.395 ms

【讨论】:

  • 谢谢,它看起来好多了。如何让postgres在内存中做Sort Method: external merge Disk: 1296392kB
  • @user2500146 检查postgresql.conf 中的work_mem 值。测试索引create index minute on t (date_trunc('minute', timestamp))
猜你喜欢
  • 2012-12-07
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-11-28
  • 2016-12-12
  • 1970-01-01
相关资源
最近更新 更多