【发布时间】:2020-08-24 05:56:15
【问题描述】:
几周前,我们的团队在处理 SQL 查询时遇到了困难,因为数据量增加了很多。
我们将不胜感激有关如何更新架构或优化查询以保持 status 过滤逻辑相同的任何建议。
简而言之:
我们有两个表a 和b。 b 对 a 具有 FK 为 M-1。
一个
id | processed
1 TRUE
2 TRUE
b
a_id| status | type_id | l_id
1 '1' 5 105
1 '3' 6 105
2 '2' 7 105
对于(l_id、type_id、a_id)的唯一组合,我们只能拥有一种状态。
我们需要计算由b 中的状态过滤的a 行数,这些行由a_id 分组。
在表 a 中,我们有 5 300 000 行。
在表中b 750 000 000 行。
所以我们需要通过以下规则计算每个a 行的状态:
对于 a_id,b 中有 x 行:
1) 如果 x 的至少一个状态等于“3”,则a_id 的状态为“3”。
2) 如果 x 的所有状态都等于 1,则状态为 1。
等等。
在当前的方法中,我们使用 array_agg() 函数来过滤子选择。所以我们的查询看起来像:
SELECT COUNT(*)
FROM (
SELECT
FROM (
SELECT at.id as id,
BOOL_AND(bt.processed) AS not_pending,
ARRAY_AGG(DISTINCT bt.status) AS status
FROM a AS at
LEFT OUTER JOIN b AS bt
ON (at.id = bt.a_id AND bt.l_id = 105 AND
bt.type_id IN (2,10,18,1,4,5,6))
WHERE at.processed = True
GROUP BY at.id) sub
WHERE not_pending = True
AND status <@ ARRAY ['1']::"char"[]
) counter
;
我们的计划如下:
Aggregate (cost=14665999.33..14665999.34 rows=1 width=8) (actual time=1875987.846..1875987.846 rows=1 loops=1)
-> GroupAggregate (cost=14166691.70..14599096.58 rows=5352220 width=37) (actual time=1875987.844..1875987.844 rows=0 loops=1)
Group Key: at.id
Filter: (bool_and(bt.processed) AND (array_agg(DISTINCT bt.status) <@ '{1}'::"char"[]))
Rows Removed by Filter: 5353930
-> Sort (cost=14166691.70..14258067.23 rows=36550213 width=6) (actual time=1860315.593..1864175.762 rows=37430745 loops=1)
Sort Key: at.id
Sort Method: external merge Disk: 586000kB
-> Hash Right Join (cost=1135654.48..8076230.39 rows=36550213 width=6) (actual time=55665.584..1846965.271 rows=37430745 loops=1)
Hash Cond: (bt.a_id = at.id)
-> Bitmap Heap Scan on b bt (cost=882095.79..7418660.65 rows=36704370 width=6) (actual time=51871.658..1826058.186 rows=37430378 loops=1)
Recheck Cond: ((l_id = 105) AND (type_id = ANY ('{2,10,18,1,4,5,6}'::integer[])))
Rows Removed by Index Recheck: 574462752
Heap Blocks: exact=28898 lossy=5726508
-> Bitmap Index Scan on db_page_index_atableobjects (cost=0.00..872919.69 rows=36704370 width=0) (actual time=51861.815..51861.815 rows=37586483 loops=1)
Index Cond: ((l_id = 105) AND (type_id = ANY ('{2,10,18,1,4,5,6}'::integer[])))
-> Hash (cost=165747.94..165747.94 rows=5352220 width=4) (actual time=3791.710..3791.710 rows=5353930 loops=1)
Buckets: 131072 Batches: 128 Memory Usage: 2507kB
-> Seq Scan on a at (cost=0.00..165747.94 rows=5352220 width=4) (actual time=0.528..2958.004 rows=5353930 loops=1)
Filter: processed
Rows Removed by Filter: 18659
Planning time: 0.328 ms
Execution time: 1876066.242 ms
正如您所见,查询执行的时间非常长,我们希望它至少
启用track_io_timing 的计划:
Aggregate (cost=14665999.33..14665999.34 rows=1 width=8) (actual time=2820945.285..2820945.285 rows=1 loops=1)
Buffers: shared hit=23 read=5998844, temp read=414465 written=414880
I/O Timings: read=2655805.505
-> GroupAggregate (cost=14166691.70..14599096.58 rows=5352220 width=930) (actual time=2820945.283..2820945.283 rows=0 loops=1)
Group Key: at.id
Filter: (bool_and(bt.processed) AND (array_agg(DISTINCT bt.status) <@ '{1}'::"char"[]))
Rows Removed by Filter: 5353930
Buffers: shared hit=23 read=5998844, temp read=414465 written=414880
I/O Timings: read=2655805.505
-> Sort (cost=14166691.70..14258067.23 rows=36550213 width=6) (actual time=2804900.123..2808826.358 rows=37430745 loops=1)
Sort Key: at.id
Sort Method: external merge Disk: 586000kB
Buffers: shared hit=18 read=5998840, temp read=414465 written=414880
I/O Timings: read=2655805.491
-> Hash Right Join (cost=1135654.48..8076230.39 rows=36550213 width=6) (actual time=55370.788..2791441.542 rows=37430745 loops=1)
Hash Cond: (bt.a_id = at.id)
Buffers: shared hit=15 read=5998840, temp read=142879 written=142625
I/O Timings: read=2655805.491
-> Bitmap Heap Scan on b bt (cost=882095.79..7418660.65 rows=36704370 width=6) (actual time=51059.047..2769127.810 rows=37430378 loops=1)
Recheck Cond: ((l_id = 105) AND (type_id = ANY ('{2,10,18,1,4,5,6}'::integer[])))
Rows Removed by Index Recheck: 574462752
Heap Blocks: exact=28898 lossy=5726508
Buffers: shared hit=13 read=5886842
I/O Timings: read=2653254.939
-> Bitmap Index Scan on db_page_index_atableobjects (cost=0.00..872919.69 rows=36704370 width=0) (actual time=51049.365..51049.365 rows=37586483 loops=1)
Index Cond: ((l_id = 105) AND (type_id = ANY ('{2,10,18,1,4,5,6}'::integer[])))
Buffers: shared hit=12 read=131437
I/O Timings: read=49031.671
-> Hash (cost=165747.94..165747.94 rows=5352220 width=4) (actual time=4309.761..4309.761 rows=5353930 loops=1)
Buckets: 131072 Batches: 128 Memory Usage: 2507kB
Buffers: shared hit=2 read=111998, temp written=15500
I/O Timings: read=2550.551
-> Seq Scan on a at (cost=0.00..165747.94 rows=5352220 width=4) (actual time=0.515..3457.040 rows=5353930 loops=1)
Filter: processed
Rows Removed by Filter: 18659
Buffers: shared hit=2 read=111998
I/O Timings: read=2550.551
Planning time: 0.347 ms
Execution time: 2821022.622 ms
【问题讨论】:
-
work_mem 的当前值是多少?您可以尝试增加很多,但只能在当前会话中减少重新检查条件步骤。
-
从性能可以接受到现在数据量增加了多少? 2折?一万倍?您对旧数据的查询有计划吗?
-
@pifor,目前,我们更多地考虑优化而不是扩展的可能性。
-
@jjanes 你好!很抱歉这么晚的反馈。 1)目前,它不是来自生产的真实数据量。我们决定生成数据来测试我们当前的基础设施和应用程序将如何工作。目前,我们使用 db.r5.xlarge AWS RDS 实例,具有 2 个内核、32GB RAM 和 4 个 vCPU。
-
@jjanes 启用
track_io_timing的计划附在更新的问题正文中。谢谢!
标签: sql postgresql relational-database postgresql-11