group by 的有效实现是通过在内部对数据进行排序来执行分组。这就是为什么一些 RDBMS 在分组时返回排序输出的原因。然而,SQL 规范并没有强制要求这种行为,所以除非 RDBMS 供应商明确记录,否则我不会打赌它会起作用(明天)。 OTOH,如果 RDBMS 隐式地进行排序,它也可能足够聪明,然后优化(远离)冗余顺序。 @jimmyb
一个使用 PostgreSQL 的例子来证明这个概念
创建一个包含 100 万条记录的表,随机日期从今天到 90 天,并按日期索引
CREATE TABLE WITHDRAW AS
SELECT (random()*1000000)::integer AS IDT_WITHDRAW,
md5(random()::text) AS NAM_PERSON,
(NOW() - ( random() * (NOW() + '90 days' - NOW()) ))::timestamp AS DAT_CREATION, -- de hoje a 90 dias atras
(random() * 1000)::decimal(12, 2) AS NUM_VALUE
FROM generate_series(1,1000000);
CREATE INDEX WITHDRAW_DAT_CREATION ON WITHDRAW(DAT_CREATION);
按日期分组,按日期截断,限制在两天范围内按日期选择
EXPLAIN
SELECT
DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '2 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1
HashAggregate (cost=11428.33..11594.13 rows=11053 width=48)
Group Key: date_trunc('DAY'::text, dat_creation)
-> Bitmap Heap Scan on withdraw w (cost=237.73..11345.44 rows=11053 width=14)
Recheck Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
-> Bitmap Index Scan on withdraw_dat_creation (cost=0.00..234.97 rows=11053 width=0)
Index Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
使用更大的限制日期范围,它选择应用 SORT
EXPLAIN
SELECT
DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '60 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1
GroupAggregate (cost=116522.65..132918.32 rows=655827 width=48)
Group Key: (date_trunc('DAY'::text, dat_creation))
-> Sort (cost=116522.65..118162.22 rows=655827 width=14)
Sort Key: (date_trunc('DAY'::text, dat_creation))
-> Seq Scan on withdraw w (cost=0.00..41949.57 rows=655827 width=14)
Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
只需在最后加上ORDER BY 1即可(没有显着差异)
GroupAggregate (cost=116522.44..132918.06 rows=655825 width=48)
Group Key: (date_trunc('DAY'::text, dat_creation))
-> Sort (cost=116522.44..118162.00 rows=655825 width=14)
Sort Key: (date_trunc('DAY'::text, dat_creation))
-> Seq Scan on withdraw w (cost=0.00..41949.56 rows=655825 width=14)
Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
PostgreSQL 10.3