Postgres 未在分区表中使用索引进行范围查询答案

【问题标题】：Postgres not using index for range query in partitioned tablePostgres 未在分区表中使用索引进行范围查询
【发布时间】：2016-04-06 09:02:38
【问题描述】：

我发现 Postgres 没有使用索引来对分区表进行范围查询。

父表及其分区的日期列使用 btree 进行索引。

这样的查询：

select * from parent_table where date >= '2015-07-01';

不使用索引。

EXPLAIN结果：

Append  (cost=0.00..106557.52 rows=3263963 width=128)
->  Seq Scan on parent_table  (cost=0.00..0.00 rows=1 width=640)
    Filter: (date >= '2015-07-01'::date)
->  Seq Scan on z_partition_2015_07  (cost=0.00..106546.02 rows=3263922 width=128)
    Filter: (date >= '2015-07-01'::date)
->  Seq Scan on z_partition_2015_08  (cost=0.00..11.50 rows=40 width=640)
    Filter: (date >= '2015-07-01'::date)

但是这样的查询：

select * from parent_table where date = '2015-07-01'

使用索引。

EXPLAIN 结果：

    Append  (cost=0.00..30400.95 rows=107602 width=128)
->  Seq Scan on parent_table  (cost=0.00..0.00 rows=1 width=640)
    Filter: (date = '2015-07-01'::date)
->  Index Scan using z_partition_2015_07_date on z_partition_2015_07  (cost=0.43..30400.95 rows=107601 width=128)
    Index Cond: (date = '2015-07-01'::date)

当我在另一个带有 date 索引的普通表上运行查询时，两个查询都使用索引。

我们应该对分区表索引做些什么？

【问题讨论】：

在条件之间尝试使用日期
显示他们两个的解释分析计划。
在不了解数据大小及其结构的情况下很难说。首先尝试VACUUM ANALYZE parent_table 收集分区表的统计信息。如果它没有帮助尝试 psql SET enable_seqscan = off 并重复您的查询。 Planner 应该使用索引扫描，以便您可以比较 seqscan 和 indexscan 的成本。对于这种类型的查询，最有可能的 seqscan 更便宜。 Indexscan 不太擅长获取大量数据。
@AdamSilenko 之间还是一样
@Musin 是的，分区包含大约 300 万行，查询会拉取大约 10%。

标签： postgresql indexing partitioning postgresql-9.3 postgresql-performance

【解决方案1】：

我假设您知道“分区”是 Postgres 中的单独的表。检索表的大部分内容时通常不使用索引（超过 ~ 5 %，这取决于许多细节），因为按顺序扫描表通常更快在这种情况下。

此外，您似乎在第一个查询中从涉及的分区中选择了所有行。索引没用...

一般来说，= 的等式谓词比>= 的谓词更具选择性。想一想：

您使用date >= '2015-07-01' 的第一个查询从分区中检索所有行（我猜，我需要查看确切的定义）。使用索引只会增加间接成本。但是您使用date = '2015-07-01' 进行的第二个查询仅获取一小部分。 Postgres 期望索引扫描更快。

【讨论】：

【解决方案2】：

也许那样会更快？运行您的查询，然后执行以下操作：

SET enable_seqscan=false

然后再次运行它。

【讨论】：