【问题标题】：Fast tuple lookup in large indexed table (Postgres)大型索引表中的快速元组查找（Postgres）
【发布时间】：2021-05-26 10:38:48
【问题描述】：

表结构

我有一个 Postgres 表 T 有 3 列 (w1, w2, occurrences)，所有三个都是整数，并且 w1 和 w2 正在引用另一个表.所以基本上，有两个外键和一个值。
w1 和 w2 可以在每一行中变成大约 1500 万个唯一值（索引）。目前，T 包含大约 18 亿行，它们或多或少是具有限制的索引的排列，即 - 如果您考虑对称矩阵 - 可能只存在一个值对。例如，可能有(252, 13, x)，但没有(13, 252, x)。但是，没有排序，所以(5, 900, x) 也可能在 T 中（排序在插入期间完成，取决于引用表中的值）。这些元组 (w1, w2) 是唯一的。目前，该表上有 3 个不同的索引，UNIQUE INDEX tdc_idx_1 on T (w1, w2)、INDEX tdc_idx_w1 on T (w1) 和 INDEX tdc_idx_w2 on T (w2)。

问题陈述

基本上，我想运行两个不同的查询。对我来说，困难是找出糟糕的性能来自哪里，或者在考虑到查询的“复杂性”和表大小的情况下接受运行时（我猜情况并非如此......）。简而言之，我可能需要查询结构方面的帮助，并且可能需要处理表索引（或者可能是一般的表设计）。
我想要的结果最直接的查询是

-- A (the OR-query)
SELECT w1, w2, occurrences FROM public.T
WHERE (w1 in (123, 555, 999) OR w2 in (123, 555, 999) AND occurrences > 1;

和

-- B (the AND-query)
SELECT w1, w2, occurrences FROM public.T
WHERE (w1 in (123, 555, 999) AND w2 in (123, 555, 999) AND occurrences > 1;

分别（注意 OR/AND）。

现在我将给出一些性能指标和一些我使用ANALYZE EXPLAIN 提出的替代查询。

性能

查询 A

-- A (OR)
EXPLAIN ANALYSE 
SELECT w1, w2, occurrences
FROM public.T
WHERE (w1 in (123, 555, 999) OR w2 in (123, 555, 999)) AND occurrences > 1

成功运行。总查询运行时间：29 分 19 秒。
173071 行受到影响。（没有解释分析的实际结果）

Gather  (cost=3378.13..13984524.32 rows=50262 width=12) (actual time=144.749..1751127.665 rows=173071 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Bitmap Heap Scan on T  (cost=2378.13..13978498.12 rows=20942 width=12) (actual time=154.140..1750873.360 rows=57690 loops=3)
        Recheck Cond: ((w1 = ANY ('{123,555,999}'::integer[])) OR (w2 = ANY ('{123,555,999}'::integer[])))
        Rows Removed by Index Recheck: 20069348
        Filter: (occurences > 1)
        Rows Removed by Filter: 110878
        Heap Blocks: exact=902 lossy=107950
        ->  BitmapOr  (cost=2378.13..2378.13 rows=214069 width=0) (actual time=114.320..114.335 rows=0 loops=1)
              ->  Bitmap Index Scan on tdc_idx_w1  (cost=0.00..1631.03 rows=148439 width=0) (actual time=17.100..17.103 rows=166301 loops=1)
                    Index Cond: (w1 = ANY ('{123,555,999}'::integer[]))
              ->  Bitmap Index Scan on tdc_idx_w2  (cost=0.00..721.96 rows=65630 width=0) (actual time=97.208..97.211 rows=339406 loops=1)
                    Index Cond: (w2 = ANY ('{123,555,999}'::integer[]))
Planning Time: 0.280 ms
JIT:
  Functions: 12
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 2.689 ms, Inlining 117.210 ms, Optimization 36.646 ms, Emission 28.227 ms, Total 184.772 ms
Execution Time: 1751571.992 ms

查询 B

-- B (AND)
EXPLAIN ANALYSE 
SELECT w1, w2, occurrences
FROM public.T
WHERE (w1 in (123, 555, 999) AND w2 in (123, 555, 999)) AND occurrences > 1

成功运行。总查询运行时间：1 秒 716 毫秒。
3 行受影响。（没有解释分析的实际结果）

Index Scan using tdc_idx_1 on T  (cost=0.58..61.13 rows=1 width=12) (actual time=88.895..239.394 rows=3 loops=1)
  Index Cond: ((w1 = ANY ('{123,555,999}'::integer[])) AND (w2 = ANY ('{123,555,999}'::integer[])))
  Filter: (occurences > 1)
Planning Time: 51.256 ms
Execution Time: 239.518 ms

查询备选方案 1

-- A1 (OR)
EXPLAIN ANALYSE 
SELECT w1, w2, occurences
FROM public.token_doc_cooccurrences
INNER JOIN (
  VALUES (123), (555), (999)
) vals(v)
ON (w1 = v)
WHERE occurences > 1
UNION
SELECT w1, w2, occurences
FROM public.token_doc_cooccurrences
INNER JOIN (
  VALUES (123), (555), (999)
) vals2(w)
ON (w2 = w)
WHERE occurences > 1;

成功运行。总查询运行时间：29 分 28 秒。
173071 行受到影响。（没有解释分析的实际结果）

HashAggregate  (cost=841679.67..842197.01 rows=51734 width=12) (actual time=1755066.279..1755315.668 rows=173071 loops=1)
  Group Key: T.w1, T.w2, T.occurences
  Batches: 5  Memory Usage: 4145kB  Disk Usage: 6920kB
  ->  Append  (cost=561.79..841291.66 rows=51734 width=12) (actual time=1088.188..1753999.477 rows=173074 loops=1)
        ->  Nested Loop  (cost=561.79..579176.31 rows=35898 width=12) (actual time=1088.181..75981.142 rows=55822 loops=1)
              ->  Values Scan on *VALUES*  (cost=0.00..0.04 rows=3 width=4) (actual time=0.006..0.032 rows=3 loops=1)
              ->  Bitmap Heap Scan on T  (cost=561.79..192939.10 rows=11966 width=12) (actual time=407.806..25231.892 rows=18607 loops=3)
                    Recheck Cond: (w1 = *VALUES*.column1)
                    Filter: (occurences > 1)
                    Rows Removed by Filter: 36826
                    Heap Blocks: exact=10630
                    ->  Bitmap Index Scan on tdc_idx_w1  (cost=0.00..558.80 rows=50963 width=0) (actual time=74.175..74.175 rows=55434 loops=3)
                          Index Cond: (w1 = *VALUES*.column1)
        ->  Nested Loop  (cost=250.51..261339.35 rows=15836 width=12) (actual time=217.234..1677133.291 rows=117252 loops=1)
              ->  Values Scan on *VALUES*_1  (cost=0.00..0.04 rows=3 width=4) (actual time=0.020..0.042 rows=3 loops=1)
              ->  Bitmap Heap Scan on T T_1  (cost=250.51..87060.31 rows=5279 width=12) (actual time=143.454..558815.229 rows=39084 loops=3)
                    Recheck Cond: (w2 = *VALUES*_1.column1)
                    Rows Removed by Index Recheck: 19099252
                    Filter: (occurences > 1)
                    Rows Removed by Filter: 74051
                    Heap Blocks: exact=18304 lossy=311451
                    ->  Bitmap Index Scan on tdc_idx_w2  (cost=0.00..249.19 rows=22482 width=0) (actual time=122.725..122.725 rows=113135 loops=3)
                          Index Cond: (w2 = *VALUES*_1.column1)
Planning Time: 68.676 ms
JIT:
  Functions: 21
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 5.749 ms, Inlining 219.374 ms, Optimization 370.011 ms, Emission 360.728 ms, Total 955.862 ms
Execution Time: 1756917.366 ms

查询 B 备选方案 1

-- B1 (AND)
EXPLAIN ANALYSE 
SELECT w1, w2, occurences
FROM public.token_doc_cooccurrences
INNER JOIN (
  VALUES (123), (555), (999)
) vals(v)
ON (w1 = v)
INNER JOIN (
  VALUES (123), (555), (999)
) vals2(w)
ON (w2 = w)
WHERE occurences > 1;

成功运行。总查询运行时间：1 秒 5 毫秒。
3 行受影响。（没有解释分析的实际结果）

Nested Loop  (cost=0.58..77.73 rows=1 width=12) (actual time=130.157..295.939 rows=3 loops=1)
  ->  Values Scan on ""*VALUES*_1""  (cost=0.00..0.04 rows=3 width=4) (actual time=0.005..0.019 rows=3 loops=1)
  ->  Nested Loop  (cost=0.58..25.87 rows=3 width=12) (actual time=54.355..98.622 rows=1 loops=3)
        ->  Values Scan on ""*VALUES*""  (cost=0.00..0.04 rows=3 width=4) (actual time=0.003..0.017 rows=3 loops=3)
        ->  Index Scan using tdc_idx_1 on T  (cost=0.58..8.60 rows=1 width=12) (actual time=32.853..32.855 rows=0 loops=9)
              Index Cond: ((w1 = ""*VALUES*"".column1) AND (w2 = ""*VALUES*_1"".column1))
              Filter: (occurences > 1)
Planning Time: 59.447 ms
Execution Time: 296.042 ms

信息
在所有测量之间，我这样做了to clear cached queries：

systemctl stop postgresql
sync
echo 3 > /proc/sys/vm/drop_caches
systemctl start postgresql

== 编辑 ==

按照@jjanes 的建议，我添加了额外的索引(w1, occurrences, w2) 和(w2, occurrences, w1) 并运行VACUUM。
另外，我设置了work_mem = 16MB。之后我再次清除缓存并再次运行 Query A Alternative 1。结果如下：

HashAggregate  (cost=3535.61..4052.95 rows=51734 width=12) (actual time=3240.753..3471.160 rows=173071 loops=1)
  Group Key: token_doc_cooccurrences.w1, token_doc_cooccurrences.w2, token_doc_cooccurrences.occurences
  Batches: 1  Memory Usage: 14353kB
  Buffers: shared hit=152549 read=993
  ->  Append  (cost=0.58..3147.60 rows=51734 width=12) (actual time=196.966..2927.069 rows=173074 loops=1)
        Buffers: shared hit=152549 read=993
        ->  Nested Loop  (cost=0.58..1642.71 rows=35898 width=12) (actual time=196.960..1948.256 rows=55822 loops=1)
              Buffers: shared hit=46960 read=533
              ->  Values Scan on ""*VALUES*""  (cost=0.00..0.04 rows=3 width=4) (actual time=0.007..0.024 rows=3 loops=1)
              ->  Index Only Scan using w1_occ_w2 on token_doc_cooccurrences  (cost=0.58..427.90 rows=11966 width=12) (actual time=85.026..603.453 rows=18607 loops=3)
                    Index Cond: ((w1 = ""*VALUES*"".column1) AND (occurences > 1))
                    Heap Fetches: 0
                    Buffers: shared hit=46960 read=533
        ->  Nested Loop  (cost=0.58..728.88 rows=15836 width=12) (actual time=40.764..573.493 rows=117252 loops=1)
              Buffers: shared hit=105589 read=460
              ->  Values Scan on ""*VALUES*_1""  (cost=0.00..0.04 rows=3 width=4) (actual time=0.008..0.027 rows=3 loops=1)
              ->  Index Only Scan using w2_occ_w1 on token_doc_cooccurrences token_doc_cooccurrences_1  (cost=0.58..190.16 rows=5279 width=12) (actual time=30.809..99.922 rows=39084 loops=3)
                    Index Cond: ((w2 = ""*VALUES*_1"".column1) AND (occurences > 1))
                    Heap Fetches: 0
                    Buffers: shared hit=105589 read=460
                    Planning:
  Buffers: shared hit=121 read=13
  Planning Time: 165.009 ms
  Execution Time: 3672.053 ms

就个人而言，我不知道设置或仅索引扫描分别有什么影响，但结果很棒！

Successfully run. Total query runtime: 6 secs 122 msec.
173071 rows affected.

~6 秒，而将近 30 分钟。

【问题讨论】：

标签： postgresql indexing

【解决方案1】：

   Rows Removed by Index Recheck: 20069348
   Heap Blocks: exact=902 lossy=107950

您的work_mem 太低，无法进行此类查询。这会导致位图“有损”，这意味着它只记住哪些块包含感兴趣的行，而不是块中的哪些行。这就是为什么它必须重新检查每个 107950 块中的所有行，甚至认为它曾经“知道”其中大多数不符合条件。所以增加work_mem。但我敢打赌，大部分时间都是在执行 IO 来读取这些块，而不是 CPU 时间进行重新检查。因此，虽然增加work_mem 是一个好主意，但它可能不会很神奇。您可以尝试通过增加effective_io_concurrency 来改进 IO，这样您就可以同时处理许多未完成的 IO 请求。这将取决于您的存储系统和驱动程序的效果。

您应该set track_io_timing = on，并通过EXPLAIN (ANALYZE, BUFFERS) 制定您的计划。这样我们就不必猜测时间花在了 IO 上，您实际上会得到测量结果。

对于查询 A 的替代公式，应该通过在 (w1, occurrences, w2) 和 (w2, occurrences, w1) 上建立索引来大大改进。这样，您可以获得仅索引扫描，这将生成更少的随机 IO，并且还可以从使用索引来满足occurrences > 1 的限制中受益。如果您将“发生”放在最后而不是中间列，则生成的索引对于其他查询也可能更通用，同时仍符合仅索引扫描的条件，但无法有效过滤 occurrences > 1 .但是您确实需要对表进行充分清空才能使仅索引扫描有效。

购买更好的存储空间是一种完全不同的方法。你目前似乎有旋转驱动器，而且速度相当慢。改用固态硬盘应该会给您带来巨大的进步。

【讨论】：

非常感谢！我编辑了我的问题以显示EXPLAIN (ANALYZE, BUFFERS)。似乎仅索引扫描有所不同，而 work_mem 设置并没有太大影响。