为什么 Postgres 不对我的查询使用更好的索引？答案

【问题标题】：Why does Postgres not use better index for my query?为什么 Postgres 不对我的查询使用更好的索引？
【发布时间】：2017-05-07 15:56:54
【问题描述】：

我有一张表格，用于记录谁在类似 Twitter 的应用程序上关注谁：

\d follow
                               Table "public.follow" .
 Column   |           Type           |                      Modifiers
 ---------+--------------------------+-----------------------------------------------------
xid       | text                     |
followee  | integer                  |
follower  | integer                  |
id        | integer                  | not null default nextval('follow_id_seq'::regclass)
createdAt | timestamp with time zone |
updatedAt | timestamp with time zone |
source    | text                     |
Indexes:
  "follow_pkey" PRIMARY KEY, btree (id)
  "follow_uniq_users" UNIQUE CONSTRAINT, btree (follower, followee)
  "follow_createdat_idx" btree ("createdAt")
  "follow_followee_idx" btree (followee)
  "follow_follower_idx" btree (follower)

表中的条目数超过一百万，当我对查询运行解释分析时，我得到：

explain analyze SELECT "follow"."follower"
FROM "public"."follow" AS "follow"
WHERE "follow"."followee" = 6
ORDER BY "follow"."createdAt" DESC
LIMIT 15 OFFSET 0;
                                                                  QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.43..353.69 rows=15 width=12) (actual time=5.456..21.497 
rows=15 loops=1)
->  Index Scan Backward using follow_createdat_idx on follow  (cost=0.43..61585.45 rows=2615 width=12) (actual time=5.455..21.488 rows=15 loops=1)
     Filter: (followee = 6)
     Rows Removed by Filter: 62368
Planning time: 0.068 ms
Execution time: 21.516 ms

为什么它在 follow_createdat_idx 上进行反向索引扫描，如果它使用 follow_followee_idx 可能会更快执行。

此查询在第一次运行时大约需要 33 毫秒，然后后续调用大约需要 22 毫秒，我觉得这比较高。

我使用的是 Amazon RDS 提供的 Postgres 9.5。知道这里可能发生了什么问题吗？

【问题讨论】：

因为如果它在 followee 索引上进行了查找，那么它就必须进行排序。如果这是 follow_followee 索引的主要用途，您可能需要尝试添加 createdAt 作为该索引的第二个字段。
@user1937198 在我这样做之后，计算时间从 20 毫秒下降到 2 毫秒。所以它奏效了。任何性能影响如果我同时保留"follow_createdat_idx" btree ("createdAt") 索引和新创建的"follow_follower_createdat_idx" btree (follower, "createdAt") 索引。因为在某些用例中，我只需要获取一个人关注的所有用户，其中第一个索引可能更优化。

标签： database postgresql indexing sql-order-by postgresql-performance

【解决方案1】：

(follower, "createdAt") 上的多列索引 user1937198 suggested 非常适合查询 - 正如您在测试中发现的那样。

由于"createdAt" 可以为NULL（未定义NOT NULL），您可能需要添加NULLS LAST 来查询和索引：

...
ORDER BY "follow"."createdAt" DESC NULLS LAST

还有：

"follow_follower_createdat_idx" btree (follower, "createdAt" DESC NULLS LAST)

PostgreSQL sort by datetime asc, null first?

还有次要其他性能影响：

(follower, "createdAt") 上的多列索引每行比 (follower) 上的简单索引大 8 个字节 - 44 个字节 vs 36 个。更多（btree 索引的页面布局与表格基本相同）：
- Making sense of Postgres row sizes
索引中涉及的列不能通过 HOT 更新进行更改。向索引添加更多列可能会阻止这种优化 - 考虑到列名，这似乎特别不可能。而且由于您在 ("createdAt") 上有另一个索引，所以无论如何这都不是问题。更多：
- PostgreSQL Initial Database Size
在("createdAt") 上建立另一个索引没有任何缺点（除了每个索引的维护成本（为了写入性能，而不是为了读取性能）。两个索引都支持不同的查询。您可能需要也可能不需要只是("createdAt") 上的索引。详细解释：
- Is a composite index also good for queries on the first field?

【讨论】：