如何在 postgres 中创建地图样式的索引？答案

【问题标题】：How can I create a map-style index in postgres?如何在 postgres 中创建地图样式的索引？
【发布时间】：2018-07-08 16:45:43
【问题描述】：

我想创建一个地图样式的索引，例如 Golang 中的 map 或 Javascript 中的关联数组。我需要地图的键是account_id，地图的value 是有序的记录列表。可能吗？我发现 Postgres 有 expression 索引，但我不知道如何从具有 OR 条件的表达式中组装地图。

我在现实世界中的例子：

我有一个包含帐户价值转移的表，我目前正在使用此查询来获取帐户的最新余额：

SELECT
        valtr_id,
        from_id,
        to_id,
        from_balance,
        to_balance
FROM value_transfer v
WHERE
        (v.block_num<=2435013) AND
        (
                (v.to_id = 22479) OR
                (v.from_id = 22479) 
        )
ORDER BY v.block_num DESC,v.valtr_id DESC LIMIT 1

必须使用OR，因为帐户可能有传出转账（设置了from_id）或传入转账（设置了to_id）。如果我有一个关联数组索引，它将保存 account_id（将作为条件派生：if from_id==account_id OR to_id=account_id）然后 Postgres 可以使用 account_id 查找该索引以获取已经排序的记录列表.由于索引已经考虑了 OR 条件，因此我不需要使用from_id=22479 和to_id=22479 构建记录列表，然后他们比较它们以查找哪个记录具有最新时间戳以获得帐户的最新余额，就像我现在正在使用我当前的查询一样。（block_num 是发生转账的区块链区块）

目前这个查询需要大量时间，因为它有一个包含 1 亿条记录的庞大数据库，这里是它的 EXPLAIN ANALYZE：

postgres-> \g
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1592973.24..1592973.24 rows=1 width=31) (actual time=86448.709..86448.710 rows=1 loops=1)
   ->  Sort  (cost=1592973.24..1595439.02 rows=986312 width=31) (actual time=86448.707..86448.707 rows=1 loops=1)
         Sort Key: block_num DESC, valtr_id DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Bitmap Heap Scan on value_transfer v  (cost=35340.86..1588041.68 rows=986312 width=31) (actual time=851.598..85082.223 rows=1387411 loops=1)
               Recheck Cond: ((to_id = 22479) OR (from_id = 22479))
               Filter: (block_num <= 2435013)
               Rows Removed by Filter: 298923
               Heap Blocks: exact=274549
               ->  BitmapOr  (cost=35340.86..35340.86 rows=1291543 width=0) (actual time=729.917..729.917 rows=0 loops=1)
                     ->  Bitmap Index Scan on vt_to_id_idx  (cost=0.00..27233.03 rows=1004862 width=0) (actual time=575.558..575.558 rows=1364039 loops=1)
                           Index Cond: (to_id = 22479)
                     ->  Bitmap Index Scan on vt_from_id_idx  (cost=0.00..7614.68 rows=286681 width=0) (actual time=154.356..154.356 rows=352366 loops=1)
                           Index Cond: (from_id = 22479)
 Planning time: 0.367 ms
 Execution time: 86448.817 ms
(16 rows)

postgres=>

表是这样定义的：

CREATE TABLE value_transfer (
    valtr_id            BIGSERIAL       PRIMARY KEY,
    tx_id               BIGINT          REFERENCES transaction(tx_id) ON DELETE CASCADE ON UPDATE CASCADE,
    block_id            INT             REFERENCES block(block_id) ON DELETE CASCADE ON UPDATE CASCADE,
    block_num           INT             NOT NULL,
    from_id             INT             NOT NULL,
    to_id               INT             NOT NULL,
    value               NUMERIC         DEFAULT 0,
    from_balance        NUMERIC         DEFAULT 0,
    to_balance          NUMERIC         DEFAULT 0,
    kind                CHAR            NOT NULL,
    depth               INT             DEFAULT 0,
    error               TEXT            NOT NULL
);
CREATE INDEX vt_tx_idx          ON  value_transfer  USING   btree   ("tx_id");
CREATE INDEX vt_block_num_idx   ON  value_transfer      USING   btree   ("block_num");
CREATE INDEX vt_block_id_idx    ON  value_transfer      USING   btree   ("block_id");
CREATE INDEX vt_from_id_idx     ON  value_transfer  USING   btree   ("from_id");
CREATE INDEX vt_to_id_idx       ON  value_transfer  USING   btree   ("to_id");

from_id 和 to_id 是帐户表的外键：

CREATE TABLE account (
    account_id          SERIAL          PRIMARY KEY,
    owner_id            INT             NOT NULL DEFAULT 0,
    last_balance        NUMERIC         DEFAULT 0,
    num_tx              BIGINT          DEFAULT 0,
    ts_created          INT             DEFAULT 0,
    block_created       INT             DEFAULT 0,
    deleted             SMALLINT        DEFAULT 0,
    block_sd            INT             DEFAULT 0,
    address             TEXT            NOT NULL UNIQUE
);

编辑：

Lukasz 提出的 UNION 查询与旧查询的执行计划比较

联合查询：

 Limit  (cost=1668089.09..1668089.10 rows=1 width=32) (actual time=6115.484..6115.485 rows=1 loops=1)
   ->  Sort  (cost=1668089.09..1671668.88 rows=1431916 width=32) (actual time=6115.483..6115.483 rows=1 loops=1)
         Sort Key: v.block_num DESC, v.valtr_id DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Append  (cost=21229.61..1660929.51 rows=1431916 width=32) (actual time=255.166..5446.818 rows=1413507 loops=1)
               ->  Bitmap Heap Scan on value_transfer v  (cost=21229.61..1229731.99 rows=1134056 width=32) (actual time=255.165..4312.769 rows=1102867 loops=1)
                     Recheck Cond: (to_id = 22479)
                     Rows Removed by Index Recheck: 9412580
                     Filter: (block_num <= 2435013)
                     Heap Blocks: exact=32392 lossy=132879
                     ->  Bitmap Index Scan on vt_to_id_idx  (cost=0.00..20946.10 rows=1134071 width=0) (actual time=241.632..241.632 rows=1102867 loops=1)
                           Index Cond: (to_id = 22479)
               ->  Index Scan using vt_from_id_idx on value_transfer v_1  (cost=0.57..416878.36 rows=297860 width=32) (actual time=0.056..952.883 rows=310640 loops=1)
                     Index Cond: (from_id = 22479)
                     Filter: (block_num <= 2435013)
 Planning time: 0.319 ms
 Execution time: 6115.539 ms
(17 rows)

THE OR CONDITION 查询（我原来的查询）：

 Limit  (cost=1276124.75..1276124.75 rows=1 width=32) (actual time=7860.439..7860.440 rows=1 loops=1)
   ->  Sort  (cost=1276124.75..1279694.24 rows=1427797 width=32) (actual time=7860.437..7860.437 rows=1 loops=1)
         Sort Key: block_num DESC, valtr_id DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Bitmap Heap Scan on value_transfer v  (cost=27162.56..1268985.76 rows=1427797 width=32) (actual time=304.197..7194.825 rows=1387411 loops=1)
               Recheck Cond: ((to_id = 22479) OR (from_id = 22479))
               Rows Removed by Index Recheck: 13260750
               Filter: (block_num <= 2435013)
               Heap Blocks: exact=37782 lossy=186738
               ->  BitmapOr  (cost=27162.56..27162.56 rows=1431937 width=0) (actual time=288.359..288.359 rows=0 loops=1)
                     ->  Bitmap Index Scan on vt_to_id_idx  (cost=0.00..20946.11 rows=1134072 width=0) (actual time=216.708..216.708 rows=1102867 loops=1)
                           Index Cond: (to_id = 22479)
                     ->  Bitmap Index Scan on vt_from_id_idx  (cost=0.00..5502.55 rows=297865 width=0) (actual time=71.649..71.649 rows=310640 loops=1)
                           Index Cond: (from_id = 22479)
 Planning time: 0.257 ms
 Execution time: 7860.481 ms
(16 rows)

使用 UNION 查询，执行速度快了 1.7 秒。

编辑 2

这个简单的查询速度非常快。

EXPLAIN ANALYZE
SELECT
        valtr_id,
        from_id,
        to_id,
        from_balance,
        to_balance,
        block_num
FROM value_transfer v
WHERE v.block_num<=2435013 AND v.from_id = 22479
LIMIT 1

 Limit  (cost=0.57..1.97 rows=1 width=32) (actual time=0.047..0.047 rows=1 loops=1)
   ->  Index Scan using vt_from_id_idx on value_transfer v  (cost=0.57..416878.36 rows=297860 width=32) (actual time=0.045..0.045 rows=1 loops=1)
         Index Cond: (from_id = 22479)
         Filter: (block_num <= 2435013)
 Planning time: 0.392 ms
 Execution time: 0.089 ms
(6 rows)

但如果是OR-ed，那就需要更多了。告诉 Postgres 在两个查询之间进行 UNION 肯定有问题。也许写一个 PL/PGSQL 会更好

【问题讨论】：

(block_num, to_id) 和 (block_num, from_id) 上的复合索引可能会有所帮助（也许按交换顺序，但我不了解您的数据模型）
@wildplasser 我已经按照 Lukasz 的建议添加了复合键，但我没有看到 EXPLAIN ANALYZE 使用它们。现在将尝试以相反的顺序添加

标签： postgresql

【解决方案1】：

我会尝试将其重写为：

--or-expansion
SELECT
        valtr_id,
        from_id,
        to_id,
        from_balance,
        to_balance,
        block_num
FROM value_transfer v
WHERE v.block_num<=2435013 AND v.to_id = 22479
UNION ALL
SELECT
        valtr_id,
        from_id,
        to_id,
        from_balance,
        to_balance,
        block_num
FROM value_transfer v
WHERE v.block_num<=2435013 AND v.from_id = 22479              
ORDER BY block_num DESC,valtr_id DESC LIMIT 1

并添加两个索引：

CREATE INDEX idx_1 ON value_transfer(from_id, block_num DESC);
CREATE INDEX idx_2 ON value_transfer(to_id, block_num DESC);

【讨论】：

谢谢！但我对这个查询有一个错误：ERROR: missing FROM-clause entry for table "v" LINE 18: ORDER BY v.block_num DESC,v.valtr_id DESC LIMIT 1
@Nulik 请立即查看
再次感谢！您提出的修改将速度提高了 1.7 秒。但我仍然认为关联地图索引会更快。您知道是否可以在 Postgres 中进行操作？如果没有，也许我应该创建一个表并将 value_transfer 记录的IDs 存储在bytea 变量中？
有点，在 Postgres 表中创建我自己的索引。
问题可能是索引idx_1 和idx_2 没有被使用？它没有出现在我在编辑部分发布的解释分析中

【解决方案2】：

请试试这个，一定会好很多。

(
SELECT valtr_id,
       from_id,
       to_id,
       from_balance,
       to_balance,
       block_num
  FROM value_transfer v
 WHERE v.block_num<=2435013 AND
       v.to_id = 22479
 ORDER BY block_num DESC,valtr_id DESC
 LIMIT 1
)
 UNION ALL
(
SELECT valtr_id,
       from_id,
       to_id,
       from_balance,
       to_balance,
       block_num
  FROM value_transfer v
 WHERE v.block_num<=2435013 AND 
       v.from_id = 22479              
 ORDER BY block_num DESC,valtr_id DESC
 LIMIT 1
 )
 ORDER BY block_num DESC,valtr_id DESC
 LIMIT 1

【讨论】：