【问题标题】:PostgreSQL efficient query with filter over booleanPostgreSQL 高效查询与布尔过滤器
【发布时间】:2017-01-11 23:56:31
【问题描述】:

有 1500 万行的表格保存用户的收件箱数据

 user_id         | integer                  | not null
 subject         | character varying(255)   | not null 
...
 last_message_id | integer                  | 
 last_message_at | timestamp with time zone |
 deleted_at      | timestamp with time zone | 

简而言之,这里是慢查询:

SELECT * 
FROM dialogs 
WHERE user_id = 1234 
AND deleted_at IS NULL 
LIMIT 21 

完整查询: (删除无关字段)

SELECT "dialogs"."id", "dialogs"."subject", "dialogs"."product_id", "dialogs"."user_id", "dialogs"."participant_id", "dialogs"."thread_id", "dialogs"."last_message_id", "dialogs"."last_message_at", "dialogs"."read_at", "dialogs"."deleted_at", "products"."id", ... , T4."id", ... , "messages"."id", ...,  
FROM "dialogs" 
LEFT OUTER JOIN "products" ON ("dialogs"."product_id" = "products"."id") 
INNER JOIN "auth_user" T4 ON ("dialogs"."participant_id" = T4."id")
LEFT OUTER JOIN "messages" ON ("dialogs"."last_message_id" = "messages"."id") 
WHERE ("dialogs"."deleted_at" IS NULL AND "dialogs"."user_id" = 9069) 
ORDER BY "dialogs"."last_message_id" DESC
LIMIT 21;

解释:

Limit  (cost=1.85..28061.24 rows=21 width=1693) (actual time=4.700..93087.871 rows=17 loops=1)
  ->  Nested Loop Left Join  (cost=1.85..9707215.30 rows=7265 width=1693) (actual time=4.699..93087.861 rows=17 loops=1)
        ->  Nested Loop  (cost=1.41..9647421.07 rows=7265 width=1457) (actual time=4.689..93062.481 rows=17 loops=1)
              ->  Nested Loop Left Join  (cost=0.99..9611285.66 rows=7265 width=1115) (actual time=4.676..93062.292 rows=17 loops=1)
                    ->  Index Scan Backward using dialogs_last_message_id on dialogs  (cost=0.56..9554417.92 rows=7265 width=102) (actual time=4.629..93062.050 rows=17 loops=1)
                          Filter: ((deleted_at IS NULL) AND (user_id = 9069))
                          Rows Removed by Filter: 6852907
                    ->  Index Scan using products_pkey on products  (cost=0.43..7.82 rows=1 width=1013) (actual time=0.012..0.012 rows=1 loops=17)
                          Index Cond: (dialogs.product_id = id)
              ->  Index Scan using auth_user_pkey on auth_user t4  (cost=0.42..4.96 rows=1 width=342) (actual time=0.009..0.010 rows=1 loops=17)
                    Index Cond: (id = dialogs.participant_id)
        ->  Index Scan using messages_pkey on messages  (cost=0.44..8.22 rows=1 width=236) (actual time=1.491..1.492 rows=1 loops=17)
              Index Cond: (dialogs.last_message_id = id)
Total runtime: 93091.494 ms
(14 rows)
  • OFFSET 未使用
  • user_id 字段上有索引。
  • deleted_at 上的索引未使用,因为选择性很高(90% 的值实际上为 NULL)。部分索引 (... WHERE deleted_at IS NULL) 也无济于事。
  • 如果查询命中了很久以前创建的结果的某些部分,它会变得特别慢。然后查询必须过滤并丢弃其间的数百万行。

索引列表:

Indexes:
    "dialogs_pkey" PRIMARY KEY, btree (id)
    "dialogs_deleted_at_d57b320e_uniq" btree (deleted_at) WHERE deleted_at IS NULL
    "dialogs_last_message_id" btree (last_message_id)
    "dialogs_participant_id" btree (participant_id)
    "dialogs_product_id" btree (product_id)
    "dialogs_thread_id" btree (thread_id)
    "dialogs_user_id" btree (user_id)

目前我正在考虑仅查询最近的数据(即具有适当索引的... WHERE last_message_at > <date 3-6 month ago> (BRIN?)。

加快此类查询的最佳做法是什么?

【问题讨论】:

  • 如果您只使用WHERE deleted_at IS NULL 运行解释查询,您会看到预期的速度吗?如果是这样,我建议在同一索引中的 user_iddeleted_at 列上放置一个索引。通常这是必需的,因为您无法按照您想象的方式合并两个单独的索引,但是将索引存储在多个列上会产生您期望的更快的查询时间。
  • 你说deleted_at上的索引没有被使用。但是您的解释表明,没有 seq 扫描。这是对dialogs_last_message_id 的反向索引扫描。怎么了?粘贴完整的查询计划。
  • 请也发布您的索引定义。 部分索引也无济于事是什么意思? user_id 上的索引,deleted_at IS NULL 应该会有所帮助。
  • @EvanCarroll - 我猜它正在使用user_id 索引。过滤 deleted_at 只是循环结果集和哑比较 deleted_at 与 NULL 直到结果中有 11 个项目。
  • 首先在 (user_id, last_message_id) 上创建部分索引,条件为 WHERE deleted_at IS NULL

标签: postgresql


【解决方案1】:

在 cmets 中发布:

首先在(user_id, last_message_id) 上创建部分索引,条件为WHERE deleted_at IS NULL

根据您的回答,这似乎非常有效:-)

【讨论】:

    【解决方案2】:

    所以,这是我尝试的解决方案的结果

    1) 索引(user_id) WHERE deleted_at IS NULL 在极少数情况下使用,具体取决于WHERE user_id = ? 条件中的某些值user_id。大多数时候查询必须像以前一样过滤掉行。

    2) 最大的加速是通过使用 (user_id, last_message_id) WHERE deleted_at IS NULL 索引。虽然它比其他测试的索引大 2.5 倍,但它一直在使用并且非常快。这是生成的查询计划

    Limit  (cost=1.72..270.45 rows=11 width=1308) (actual time=0.105..0.468 rows=8 loops=1)
       ->  Nested Loop Left Join  (cost=1.72..247038.21 rows=10112 width=1308) (actual time=0.104..0.465 rows=8 loops=1)
             ->  Nested Loop  (cost=1.29..164532.13 rows=10112 width=1072) (actual time=0.071..0.293 rows=8 loops=1)
                   ->  Nested Loop Left Join  (cost=0.86..116292.45 rows=10112 width=736) (actual time=0.057..0.198 rows=8 loops=1)
                         ->  Index Scan Backward using dialogs_user_id_last_message_id_d57b320e on dialogs  (cost=0.43..38842.21 rows=10112 width=102) (actual time=0.038..0.084 rows=8 loops=1)
                               Index Cond: (user_id = 9069)
                         ->  Index Scan using products_pkey on products  (cost=0.43..7.65 rows=1 width=634) (actual time=0.012..0.012 rows=1 loops=8)
                               Index Cond: (dialogs.product_id = id)
                   ->  Index Scan using auth_user_pkey on auth_user t4  (cost=0.42..4.76 rows=1 width=336) (actual time=0.010..0.010 rows=1 loops=8)
                         Index Cond: (id = dialogs.participant_id)
             ->  Index Scan using messages_pkey on messages  (cost=0.44..8.15 rows=1 width=236) (actual time=0.019..0.020 rows=1 loops=8)
                   Index Cond: (dialogs.last_message_id = id)
     Total runtime: 0.678 ms
    

    感谢@jcaron。您的建议应该是可接受的答案。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-07-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多