【发布时间】:2018-07-08 16:45:43
【问题描述】:
我想创建一个地图样式的索引,例如 Golang 中的 map 或 Javascript 中的关联数组。我需要地图的键是account_id,地图的value 是有序的记录列表。可能吗?我发现 Postgres 有 expression 索引,但我不知道如何从具有 OR 条件的表达式中组装地图。
我在现实世界中的例子:
我有一个包含帐户价值转移的表,我目前正在使用此查询来获取帐户的最新余额:
SELECT
valtr_id,
from_id,
to_id,
from_balance,
to_balance
FROM value_transfer v
WHERE
(v.block_num<=2435013) AND
(
(v.to_id = 22479) OR
(v.from_id = 22479)
)
ORDER BY v.block_num DESC,v.valtr_id DESC LIMIT 1
必须使用OR,因为帐户可能有传出转账(设置了from_id)或传入转账(设置了to_id)。如果我有一个关联数组索引,它将保存 account_id(将作为条件派生:if from_id==account_id OR to_id=account_id)然后 Postgres 可以使用 account_id 查找该索引以获取已经排序的记录列表.由于索引已经考虑了 OR 条件,因此我不需要使用from_id=22479 和to_id=22479 构建记录列表,然后他们比较它们以查找哪个记录具有最新时间戳以获得帐户的最新余额,就像我现在正在使用我当前的查询一样。 (block_num 是发生转账的区块链区块)
目前这个查询需要大量时间,因为它有一个包含 1 亿条记录的庞大数据库,这里是它的 EXPLAIN ANALYZE:
postgres-> \g
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1592973.24..1592973.24 rows=1 width=31) (actual time=86448.709..86448.710 rows=1 loops=1)
-> Sort (cost=1592973.24..1595439.02 rows=986312 width=31) (actual time=86448.707..86448.707 rows=1 loops=1)
Sort Key: block_num DESC, valtr_id DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on value_transfer v (cost=35340.86..1588041.68 rows=986312 width=31) (actual time=851.598..85082.223 rows=1387411 loops=1)
Recheck Cond: ((to_id = 22479) OR (from_id = 22479))
Filter: (block_num <= 2435013)
Rows Removed by Filter: 298923
Heap Blocks: exact=274549
-> BitmapOr (cost=35340.86..35340.86 rows=1291543 width=0) (actual time=729.917..729.917 rows=0 loops=1)
-> Bitmap Index Scan on vt_to_id_idx (cost=0.00..27233.03 rows=1004862 width=0) (actual time=575.558..575.558 rows=1364039 loops=1)
Index Cond: (to_id = 22479)
-> Bitmap Index Scan on vt_from_id_idx (cost=0.00..7614.68 rows=286681 width=0) (actual time=154.356..154.356 rows=352366 loops=1)
Index Cond: (from_id = 22479)
Planning time: 0.367 ms
Execution time: 86448.817 ms
(16 rows)
postgres=>
表是这样定义的:
CREATE TABLE value_transfer (
valtr_id BIGSERIAL PRIMARY KEY,
tx_id BIGINT REFERENCES transaction(tx_id) ON DELETE CASCADE ON UPDATE CASCADE,
block_id INT REFERENCES block(block_id) ON DELETE CASCADE ON UPDATE CASCADE,
block_num INT NOT NULL,
from_id INT NOT NULL,
to_id INT NOT NULL,
value NUMERIC DEFAULT 0,
from_balance NUMERIC DEFAULT 0,
to_balance NUMERIC DEFAULT 0,
kind CHAR NOT NULL,
depth INT DEFAULT 0,
error TEXT NOT NULL
);
CREATE INDEX vt_tx_idx ON value_transfer USING btree ("tx_id");
CREATE INDEX vt_block_num_idx ON value_transfer USING btree ("block_num");
CREATE INDEX vt_block_id_idx ON value_transfer USING btree ("block_id");
CREATE INDEX vt_from_id_idx ON value_transfer USING btree ("from_id");
CREATE INDEX vt_to_id_idx ON value_transfer USING btree ("to_id");
from_id 和 to_id 是帐户表的外键:
CREATE TABLE account (
account_id SERIAL PRIMARY KEY,
owner_id INT NOT NULL DEFAULT 0,
last_balance NUMERIC DEFAULT 0,
num_tx BIGINT DEFAULT 0,
ts_created INT DEFAULT 0,
block_created INT DEFAULT 0,
deleted SMALLINT DEFAULT 0,
block_sd INT DEFAULT 0,
address TEXT NOT NULL UNIQUE
);
编辑:
Lukasz 提出的 UNION 查询与旧查询的执行计划比较
联合查询:
Limit (cost=1668089.09..1668089.10 rows=1 width=32) (actual time=6115.484..6115.485 rows=1 loops=1)
-> Sort (cost=1668089.09..1671668.88 rows=1431916 width=32) (actual time=6115.483..6115.483 rows=1 loops=1)
Sort Key: v.block_num DESC, v.valtr_id DESC
Sort Method: top-N heapsort Memory: 25kB
-> Append (cost=21229.61..1660929.51 rows=1431916 width=32) (actual time=255.166..5446.818 rows=1413507 loops=1)
-> Bitmap Heap Scan on value_transfer v (cost=21229.61..1229731.99 rows=1134056 width=32) (actual time=255.165..4312.769 rows=1102867 loops=1)
Recheck Cond: (to_id = 22479)
Rows Removed by Index Recheck: 9412580
Filter: (block_num <= 2435013)
Heap Blocks: exact=32392 lossy=132879
-> Bitmap Index Scan on vt_to_id_idx (cost=0.00..20946.10 rows=1134071 width=0) (actual time=241.632..241.632 rows=1102867 loops=1)
Index Cond: (to_id = 22479)
-> Index Scan using vt_from_id_idx on value_transfer v_1 (cost=0.57..416878.36 rows=297860 width=32) (actual time=0.056..952.883 rows=310640 loops=1)
Index Cond: (from_id = 22479)
Filter: (block_num <= 2435013)
Planning time: 0.319 ms
Execution time: 6115.539 ms
(17 rows)
THE OR CONDITION 查询(我原来的查询):
Limit (cost=1276124.75..1276124.75 rows=1 width=32) (actual time=7860.439..7860.440 rows=1 loops=1)
-> Sort (cost=1276124.75..1279694.24 rows=1427797 width=32) (actual time=7860.437..7860.437 rows=1 loops=1)
Sort Key: block_num DESC, valtr_id DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on value_transfer v (cost=27162.56..1268985.76 rows=1427797 width=32) (actual time=304.197..7194.825 rows=1387411 loops=1)
Recheck Cond: ((to_id = 22479) OR (from_id = 22479))
Rows Removed by Index Recheck: 13260750
Filter: (block_num <= 2435013)
Heap Blocks: exact=37782 lossy=186738
-> BitmapOr (cost=27162.56..27162.56 rows=1431937 width=0) (actual time=288.359..288.359 rows=0 loops=1)
-> Bitmap Index Scan on vt_to_id_idx (cost=0.00..20946.11 rows=1134072 width=0) (actual time=216.708..216.708 rows=1102867 loops=1)
Index Cond: (to_id = 22479)
-> Bitmap Index Scan on vt_from_id_idx (cost=0.00..5502.55 rows=297865 width=0) (actual time=71.649..71.649 rows=310640 loops=1)
Index Cond: (from_id = 22479)
Planning time: 0.257 ms
Execution time: 7860.481 ms
(16 rows)
使用 UNION 查询,执行速度快了 1.7 秒。
编辑 2
这个简单的查询速度非常快。
EXPLAIN ANALYZE
SELECT
valtr_id,
from_id,
to_id,
from_balance,
to_balance,
block_num
FROM value_transfer v
WHERE v.block_num<=2435013 AND v.from_id = 22479
LIMIT 1
Limit (cost=0.57..1.97 rows=1 width=32) (actual time=0.047..0.047 rows=1 loops=1)
-> Index Scan using vt_from_id_idx on value_transfer v (cost=0.57..416878.36 rows=297860 width=32) (actual time=0.045..0.045 rows=1 loops=1)
Index Cond: (from_id = 22479)
Filter: (block_num <= 2435013)
Planning time: 0.392 ms
Execution time: 0.089 ms
(6 rows)
但如果是OR-ed,那就需要更多了。告诉 Postgres 在两个查询之间进行 UNION 肯定有问题。也许写一个 PL/PGSQL 会更好
【问题讨论】:
-
(block_num, to_id) 和 (block_num, from_id) 上的复合索引可能会有所帮助(也许按交换顺序,但我不了解您的数据模型)
-
@wildplasser 我已经按照 Lukasz 的建议添加了复合键,但我没有看到 EXPLAIN ANALYZE 使用它们。现在将尝试以相反的顺序添加
标签: postgresql