提高多列索引和排序的性能答案

【问题标题】：Improve performance on multicolumn index and sort提高多列索引和排序的性能
【发布时间】：2014-03-17 19:33:22
【问题描述】：

SELECT * FROM table1
WHERE (col1, col2) IN (($1, $2), ($3, $4))
ORDER  BY col3
LIMIT  10;

EXPLAIN ANALYZE的输出：

 Limit  (cost=59174.75..59174.77 rows=10 width=113) (actual time=3632.627..3632.661 rows=10 loops=1)
   ->  Sort  (cost=59174.75..59180.22 rows=2188 width=113) (actual time=3632.623..3632.634 rows=10 loops=1)
         Sort Key: col3
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Nested Loop  (cost=2.62..59127.46 rows=2188 width=113) (actual time=0.234..3561.309 rows=38347 loops=1)
   ...........
   Total runtime: 3632.818 ms

但是当我删除订单时：

SELECT * FROM table1 WHERE (col1, col2) IN (($1, $2), ($3, $4)) LIMIT 10;
 Limit  (cost=2.62..272.85 rows=10 width=105) (actual time=0.258..1.143 rows=10 loops=1)
   ->  Nested Loop  (cost=2.62..59127.46 rows=2188 width=105) (actual time=0.255..1.115 rows=10 loops=1)
........
Total runtime: 1.306 ms

有一个复合btree index on (col1, col2) 和一个btree index on col3。
写入性能和存储不是优先事项。读取性能至关重要，需要尽可能快。
这必须能够支持使用 IN 子句查询：WHERE (col1, col2) IN (($1, $2), ($3, $4)) ORDER BY col3 LIMIT 10;。（查找总是带有一个 IN 子句，然后是顺序。）

注意：是否可以在 (col1, col2, col3) 上创建索引？这将使用(col1, col2) 查找并已订购col3 ...

【问题讨论】：

标签： postgresql indexing query-performance postgresql-performance

【解决方案1】：

是的。您已经在问题中得到了答案。
对于给定的查询，(col1, col2, col3) 上的 multicolumn index 应该是完美的。试试看吧。

更多关于多列 B 树索引中列顺序的信息，请参阅 dba.SE 上的相关问题：
Is a composite index also good for queries on the first field?

此外，如果您实际上并不需要 table1 中的所有列，只需将所需的列放在 SELECT 列表中而不是 * 中即可获得性能。

在

至于您的附加要求：

WHERE (col1, col2) IN (($1, $2), ($3, $4))

相当于：

WHERE (col1 = $1 AND col2 = $2 OR
       col1 = $3 AND col2 = $4)

这降低了索引对(col1, col2, col3) 的有效性，因为 Postgres 不能只从索引中获取预排序列表。这取决于。您的IN 列表中的项目越少，每个(col1, col2) 中具有相同col3 的行越多，您从所述索引中获得的收益就越多。

您必须进行测试。另外创建索引，确保您的server is configured reasonably，统计数据是最新的（ANALYZE）并且您的cost settings 是合理的，然后EXPLAIN 将显示 Postgres 选择的内容。请务必运行一组代表您的用例的查询。最后，删除不使用的索引。

欺骗 Postgres 有效地使用特殊索引

排序步骤似乎是昂贵的部分。试试这个替代查询：IN 列表中的每个项目一条 UNION ALL 腿。这向 Postgres 提出了一个它无法拒绝的提议：特殊索引非常适合这个查询。最后的排序步骤对于少数IN 项目来说很便宜。

(
SELECT *
FROM   table1
WHERE  col1 = $1 AND col2 = $3
ORDER  BY col3
LIMIT  10
)
UNION  ALL
(
SELECT *
FROM   table1
WHERE  col1 = $3 AND col2 = $4
ORDER  BY col3
LIMIT  10
)
... UNION  ALL ...
ORDER  BY col3
LIMIT  10

请注意，除了最后的ORDER BY 和LIMIT 之外，每条腿都需要所有括号以允许ORDER BY 和LIMIT。

【讨论】：

@erwinbranstetter，我稍微修改了问题：This must be able to support querying with IN clause: WHERE (col1, col2) IN (($1, $2), ($3, $4)) ORDER BY col3 LIMIT 10;. (Look ups are always with an IN clause and then order.) 在这种情况下多列索引方法仍然成立吗？
@alumns：我添加了一些内容来解决您的添加问题。
我添加了索引并在没有IN子句的情况下查找非常快。但是，当我开始使用 IN 子句时，性能会急剧下降。我还能做些什么来提高性能？
似乎目的是为了避免昂贵的SORT步骤。单一查找设法避免它，而 IN 子句没有
@alumns：我可能有东西给你。检查添加的段落。