【问题标题】:Slow query when adding an extra WHERE添加额外 WHERE 时查询速度慢
【发布时间】:2019-12-18 19:43:35
【问题描述】:

我有一个查询在使用一个或两个过滤器运行时按预期执行。 但是当我添加第三个时,执行计划似乎发生了变化并影响了性能。

使用 2 个过滤器进行查询

EXPLAIN ANALYZE SELECT
   "ed_system"."name",
   "ed_system"."geom"::bytea,
   (
      "ed_system"."geom" <<->> ST_GeomFromEWKB('\x01010000a0e6100000000000000000000000000000000000000000000000000000'::bytea)
   )
   AS "distance"
FROM
   "ed_body" 
   INNER JOIN
      "ed_system" 
      ON ("ed_body"."system_id" = "ed_system"."id") 
WHERE
   (
      ST_3DDistance("ed_system"."geom", ST_GeomFromEWKB('\x01010000a0e6100000000000000000000000000000000000000000000000000000'::bytea)) < 20.0 
      AND NOT "ed_body"."ring_count" = 0
   )
ORDER BY
   "distance" ASC LIMIT 10

输出

Limit  (cost=0.85..24526.33 rows=10 width=60) (actual time=17.999..34.674 rows=10 loops=1)
  ->  Nested Loop  (cost=0.85..252614946.65 rows=103001 width=60) (actual time=17.997..34.670 rows=10 loops=1)
        ->  Index Scan using ed_system_geom_id on ed_system  (cost=0.42..41629457.42 rows=6361500 width=64) (actual time=17.973..34.448 rows=12 loops=1)
              Order By: (geom <<->> '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry)
              Filter: (st_3ddistance(geom, '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry) < '20'::double precision)
        ->  Index Scan using ed_body_system_id_0b3a29a2 on ed_body  (cost=0.43..33.15 rows=2 width=4) (actual time=0.010..0.015 rows=1 loops=12)
              Index Cond: (system_id = ed_system.id)
              Filter: (ring_count <> 0)
              Rows Removed by Filter: 9
Planning time: 0.557 ms
Execution time: 34.729 ms

带有额外过滤器的查询

EXPLAIN ANALYZE SELECT
   "ed_system"."name",
   "ed_system"."geom"::bytea,
   (
      "ed_system"."geom" <<->> ST_GeomFromEWKB('\x01010000a0e6100000000000000000000000000000000000000000000000000000'::bytea)
   )
   AS "distance"
FROM
   "ed_body" 
   INNER JOIN
      "ed_system" 
      ON ("ed_body"."system_id" = "ed_system"."id") 
WHERE
   (
      ST_3DDistance("ed_system"."geom", ST_GeomFromEWKB('\x01010000a0e6100000000000000000000000000000000000000000000000000000'::bytea)) < 20.0 
      AND NOT "ed_body"."ring_count" = 0
      AND "ed_body"."scoopable" = True
   )
ORDER BY
   "distance" ASC LIMIT 10

输出:

Limit  (cost=0.85..84471.16 rows=10 width=60) (actual time=238721.835..238721.835 rows=0 loops=1)
  ->  Nested Loop  (cost=0.85..252549321.31 rows=29898 width=60) (actual time=238721.832..238721.832 rows=0 loops=1)
        ->  Index Scan using ed_system_geom_id on ed_system  (cost=0.42..41629457.42 rows=6361500 width=64) (actual time=20.292..238664.727 rows=107 loops=1)
              Order By: (geom <<->> '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry)
              Filter: (st_3ddistance(geom, '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry) < '20'::double precision)
              Rows Removed by Filter: 19118611
        ->  Index Scan using ed_body_system_id_0b3a29a2 on ed_body  (cost=0.43..33.15 rows=1 width=4) (actual time=0.532..0.532 rows=0 loops=107)
              Index Cond: (system_id = ed_system.id)
              Filter: (scoopable AND (ring_count <> 0))
              Rows Removed by Filter: 12
Planning time: 0.952 ms
Execution time: 238733.393 ms

几乎相同的计划,除了第二个计划在 ed_system 索引扫描上有大量过滤行并且是 sssllooowww。

这让我认为,出于某种原因,在第二个计划中,规划者认为最好先循环遍历 ed_body 表并在过滤距离连接之前对其进行过滤。 我运行了 VACUUM,添加了额外的索引并尝试了子查询,但无济于事。

有什么建议吗?

编辑:

索引:

CREATE UNIQUE INDEX ed_body_pkey ON ed_body(id int4_ops); 
CREATE INDEX ed_body_system_id_0b3a29a2 ON ed_body(system_id int4_ops); 
CREATE INDEX ed_body_test4 ON ed_body(ring_count int4_ops,scoopable bool_ops);

CREATE UNIQUE INDEX ed_system_pkey ON ed_system(id int4_ops); 
CREATE INDEX ed_system_geom_id ON ed_system USING GIST (geom gist_geometry_ops_nd); 
CREATE INDEX ed_system_geom_idx ON ed_system(geom btree_geometry_ops);

编辑:

强制对 ed_system 表进行索引扫描。但是很慢。

Index Only Scan using ed_system_geom_idx on ed_system  (cost=0.56..5581365.22 rows=358 width=4) (actual time=7.542..53948.185 rows=12 loops=1)
  Filter: ((geom && '01030000A0E6100000010000000500000000000000000024C000000000000024C000000000000024C000000000000024C0000000000000244000000000000024C0000000000000244000000000000024400000000000002440000000000000244000000000000024C0000000000000244000000000000024C000000000000024C000000000000024C0'::geometry) AND ('01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry && st_expand(geom, '10'::double precision)) AND _st_3ddwithin(geom, '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry, '10'::double precision))
  Rows Removed by Filter: 19118706
  Heap Fetches: 0
Planning time: 0.709 ms
Execution time: 53948.245 ms

【问题讨论】:

  • 您在“ed_body”和“ed_system”中有哪些索引?
  • 你上一次在牌桌上跑VACUUM ANALYZE是什么时候?
  • 这些计划对我来说看起来基本相同(以附加条件为模),只是执行不同......
  • @MilesElam 在测试前跑了。
  • @SamMason 该查询不会告诉您它是否可以使用索引。由于 LIMIT 1,它可能能够使用索引,但仍然认为 seq 扫描更快并且仍然选择它。移除 LIMIT 以获得可靠的答案。

标签: sql postgresql postgis


【解决方案1】:

索引扫描中的“Order By:”在这里看起来很有吸引力,但可能不是。您可以通过将 order by 子句更改为 ORDER BY "distance" + 0.0 ASC LIMIT 10 来强制它不使用它。如果这不能解决问题,请为它发布 EXPLAIN (ANALYZE, BUFFERS),因为它至少会为我们提供更多有用的信息。

您说您猜到了大约 20% 的行遇到了 "ed_body"."scoopable" = True(为什么要猜?只需运行 count(*) 查询即可确定)。但是实际行数从某个未知值减少(因为它一旦达到 LIMIT 就停止了),一直下降到零。因此,要么它们比这更罕见,要么存在一种相关性,使它们在满足其他标准的行中特别罕见。

【讨论】:

    猜你喜欢
    • 2023-04-05
    • 2015-03-22
    • 2014-02-17
    • 1970-01-01
    • 2022-10-24
    • 1970-01-01
    • 2012-08-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多