【问题标题】:Postgres partition pruningPostgres 分区修剪
【发布时间】:2012-04-17 09:05:30
【问题描述】:

我在 Postgres 中有一张大桌子。

表名是bigtable,列是:

integer    |timestamp   |xxx |xxx |...|xxx
category_id|capture_time|col1|col2|...|colN

我已经按照 category_id 的模 10 和 capture_time 列的日期部分对表进行了分区。

分区表如下所示:

CREATE TABLE myschema.bigtable_d000h0(
    CHECK ( category_id%10=0 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02')
) INHERITS (myschema.bigtable);

CREATE TABLE myschema.bigtable_d000h1(
    CHECK ( category_id%10=1 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02')
) INHERITS (myschema.bigtable);

当我在 where 子句中使用 category_id 和 capture_time 运行查询时,分区没有按预期修剪。

explain select * from bigtable where capture_time >= '2012-01-01' and  capture_time < '2012-01-02' and category_id=100;

"Result  (cost=0.00..9476.87 rows=1933 width=216)"
"  ->  Append  (cost=0.00..9476.87 rows=1933 width=216)"
"        ->  Seq Scan on bigtable  (cost=0.00..0.00 rows=1 width=210)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h0 bigtable  (cost=0.00..1921.63 rows=1923 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h1 bigtable  (cost=0.00..776.93 rows=1 width=218)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h2 bigtable  (cost=0.00..974.47 rows=1 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h3 bigtable  (cost=0.00..1351.92 rows=1 width=214)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h4 bigtable  (cost=0.00..577.04 rows=1 width=217)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h5 bigtable  (cost=0.00..360.67 rows=1 width=219)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h6 bigtable  (cost=0.00..1778.18 rows=1 width=214)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h7 bigtable  (cost=0.00..315.82 rows=1 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h8 bigtable  (cost=0.00..372.06 rows=1 width=219)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h9 bigtable  (cost=0.00..1048.16 rows=1 width=215)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"

但是,如果我在 where 子句中添加精确的模标准 (category_id%10=0),它会完美运行

explain select * from bigtable where capture_time >= '2012-01-01' and  capture_time < '2012-01-02' and category_id=100 and category_id%10=0;

"Result  (cost=0.00..2154.09 rows=11 width=215)"
"  ->  Append  (cost=0.00..2154.09 rows=11 width=215)"
"        ->  Seq Scan on bigtable  (cost=0.00..0.00 rows=1 width=210)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"
"        ->  Seq Scan on bigtable_d000h0 bigtable  (cost=0.00..2154.09 rows=10 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"

有什么方法可以使分区修剪正常工作,而不必在每个查询中添加模条件?

【问题讨论】:

  • 您使用的是哪个版本?我认为规划器在 9.x 中的分区方面有一些改进
  • 您可以使约束不那么冗长:CHECK (category_id%10=1 AND date_trunc('month', capture_time) = '2012-01-01'::date)
  • @a_horse_with_no_name 我使用的是 9.1
  • @Clodoaldo puing 基于时间工作正常。只有模部分没有按预期工作。顺便说一句,它更冗长,因为数据可以按周或月分区,用于不经常使用的日期范围。

标签: postgresql partitioning database-partitioning partition-problem


【解决方案1】:

事情是:对于排除约束 PostgreSQL will create an implicit index。在您的情况下,该索引将是部分索引,因为您在列上使用了表达式,而不仅仅是它的值。并且在documentation中有说明(找11-2的例子):

PostgreSQL 没有一个复杂的定理证明器可以识别以不同形式编写的数学等价表达式。 (不仅这样的一般定理证明器极难创建,而且可能太慢而无法真正使用。)系统可以识别简单的不等式含义,例如“x 否则谓词条件必须与查询的 WHERE 条件的一部分完全匹配,否则索引将不会被识别为可用。匹配发生在查询计划时,而不是运行时。

因此,您的结果应该与创建 CHECK 约束时使用的表达式完全相同。

对于基于 HASH 的分区,我更喜欢 2 种方法:

  • 添加一个可以采用一组有限值(在您的情况下为 10 个)的字段,最好是设计存在这样的值;
  • 以与指定时间戳范围相同的方式指定哈希范围:MINVALUE

此外,还可以创建 2 级分区:

  • 在第一级,您根据 category_id HASH 创建 10 个分区;
  • 在第二级,您可以根据日期范围创建必要数量的分区。

虽然我总是尝试只使用 1 列进行分区,但更易于管理。

【讨论】:

  • 感谢您的意见。我发布的代码是使用 1 级继承进行 2 级分区。性能方面,它比实际的 2 级继承运行得更快。我知道另一种方式(您建议的方式)应该更快,因为在第一级检查的表较少,而在下一级,只有从合格的第一级表继承的表必须被扫描。但实际上速度较慢。
  • 减慢速度的是分区修剪逻辑,而不是实际的表扫描。在这两种情况下,优化器都会正确地修剪表,但在 2 级继承的情况下决定修剪哪些分区需要更长的时间。
  • 关于 2 级分区的一个有趣的事情是您可以查询 1 级表,现在修剪 2 级分区需要更少的时间。我可以使用它以不影响性能的方式归档数据。
【解决方案2】:

对于遇到相同问题的任何人: 我得出的结论是,最简单的方法是更改​​查询以包含模条件category_id%10=0

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-06-14
    • 1970-01-01
    • 1970-01-01
    • 2017-11-01
    相关资源
    最近更新 更多