【发布时间】:2019-11-06 17:10:37
【问题描述】:
我返回一个平均执行时间为 170 秒的查询。我浏览了 PSQL 文档,他们提到如果我们增加 work_mem 性能将会提高。即使性能没有提高,我也将 work_mem 增加到 1000 MB。
注意:我索引了所有属于查询部分的字段。
下面我粘贴数据库中存在的记录、查询执行计划、查询、结果。
- 数据库中存在的记录数:
event_logs=> select count(*) from events;
count
----------
18706734
(1 row)
- 查询:
select raw->'request_payload'->'source'->0 as file,
count(raw->'request_payload'->>'status') as count,
raw->'request_payload'->>'status' as status
from events
where client = 'NTT'
and to_char(datetime, 'YYYY-MM-DD') = '2019-10-31'
and event_name = 'wbs_indexing'
group by raw->'request_payload'->'source'->0,
raw->'request_payload'->>'status';
- 结果:
file | count | status
-----------------------------+--------+--
"xyz.csv" | 91878 | failure
"abc.csv" | 91816 | failure
"efg.csv" | 398196 | failure
(3 rows)
- 默认 work_mem(4 MB) 查询执行计划:
event_logs=> SHOW work_mem;
work_mem
----------
4MB
(1 row)
event_logs=> explain analyze select raw->'request_payload'->'source'->0 as file, count(raw->'request_payload'->>'status') as count, raw->'request_payload'->>'status' as status from events where to_char(datetime, 'YYYY-MM-DD') = '2019-10-31' and client = 'NTT' and event_name = 'wbs_indexing' group by raw->'request_payload'->'source'->0, raw->'request_payload'->>'status';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Finalize GroupAggregate (cost=3256017.54..3267087.56 rows=78474 width=72) (actual time=172547.598..172965.581 rows=3 loops=1)
Group Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
-> Gather Merge (cost=3256017.54..3264829.34 rows=65674 width=72) (actual time=172295.204..172965.630 rows=9 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=3255017.52..3256248.91 rows=32837 width=72) (actual time=172258.342..172737.534 rows=3 loops=3)
Group Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
-> Sort (cost=3255017.52..3255099.61 rows=32837 width=533) (actual time=171794.584..172639.670 rows=193963 loops=3)
Sort Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
Sort Method: external merge Disk: 131856kB
-> Parallel Seq Scan on events (cost=0.00..3244696.75 rows=32837 width=533) (actual time=98846.155..169311.063 rows=193963 loops=3)
Filter: ((client = 'NTT'::text) AND (event_name = 'wbs_indexing'::text) AND (to_char(datetime, 'YYYY-MM-DD'::text) = '2019-10-31'::text))
Rows Removed by Filter: 6041677
Planning time: 0.953 ms
Execution time: 172983.273 ms
(15 rows)
- 增加了 work_mem(1000 MB) 查询执行计划:
event_logs=> SHOW work_mem;
work_mem
----------
1000MB
(1 row)
event_logs=> explain analyze select raw->'request_payload'->'source'->0 as file, count(raw->'request_payload'->>'status') as count, raw->'request_payload'->>'status' as status from events where to_char(datetime, 'YYYY-MM-DD') = '2019-10-31' and client = 'NTT' and event_name = 'wbs_indexing' group by raw->'request_payload'->'source'->0, raw->'request_payload'->>'status';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=3248160.04..3259230.06 rows=78474 width=72) (actual time=167979.419..168189.228 rows=3 loops=1)
Group Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
-> Gather Merge (cost=3248160.04..3256971.84 rows=65674 width=72) (actual time=167949.951..168189.282 rows=9 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=3247160.02..3248391.41 rows=32837 width=72) (actual time=167945.607..168083.707 rows=3 loops=3)
Group Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
-> Sort (cost=3247160.02..3247242.11 rows=32837 width=533) (actual time=167917.891..167975.549 rows=193963 loops=3)
Sort Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
Sort Method: quicksort Memory: 191822kB
-> Parallel Seq Scan on events (cost=0.00..3244696.75 rows=32837 width=533) (actual time=98849.936..167570.669 rows=193963 loops=3)
Filter: ((client = 'NTT'::text) AND (event_name = 'wbs_indexing'::text) AND (to_char(datetime, 'YYYY-MM-DD'::text) = '2019-10-31'::text))
Rows Removed by Filter: 6041677
Planning time: 0.238 ms
Execution time: 168199.046 ms
(15 rows)
- 谁能帮我提高这个查询的性能?
【问题讨论】:
-
work_mem的增加使您摆脱了磁盘排序,但 seqscan 仍然占用了大部分时间。您是否在events表上为(client, event_name, to_char(datetime, 'YYYY-MM-DD'::text))列创建了索引? -
是的,我为查询中的所有列名编制了索引。
标签: postgresql query-performance