当查询中有“WHERE IN”时，为什么postgis不应用索引？答案

【问题标题】：Why is postgis not applying an index when there is a "WHERE IN" in the query?当查询中有“WHERE IN”时，为什么postgis不应用索引？
【发布时间】：2020-06-03 12:36:28
【问题描述】：

我们有一个表shops，列名为location，类型为GEOMETRY(POINT,4326)，version，类型为tinyint 和一张桌子 version 单行包含一个整数。

为什么下面的查询不使用 GIST(location) 的索引？

SELECT * FROM shops 
WHERE ("version" IN (SELECT "version" FROM "version")) 
ORDER BY (location <-> '0020000001000010e64029d460d2a8aee0404bc8bb0955ea17'::geometry) LIMIT 10;

没有IN 的相同查询在哪里使用索引？

SELECT * FROM shops 
WHERE ("version" = (SELECT "version" FROM "version" LIMIT 1)) 
ORDER BY (location <-> '0020000001000010e64029d460d2a8aee0404bc8bb0955ea17'::geometry) LIMIT 10;

自从我们从 postgres 9 更新到 11 后，这对我们产生了影响。我能够将问题追溯到上述选择。

编辑：添加 qry 分析

第一个查询（没有索引应用）：

"Limit  (cost=25260.30..25260.32 rows=10 width=1275) (actual time=254.809..254.814 rows=10 loops=1)"
"  ->  Sort  (cost=25260.30..25260.39 rows=36 width=1275) (actual time=254.807..254.809 rows=10 loops=1)"
"        Sort Key: ((shops.location <-> '0101000020E6100000E0AEA8D260D4294017EA5509BBC84B40'::geometry))"
"        Sort Method: top-N heapsort  Memory: 54kB"
"        ->  Nested Loop  (cost=41.88..25259.52 rows=36 width=1275) (actual time=0.099..215.201 rows=58179 loops=1)"
"              Join Filter: (shops.version = version.version)"
"              ->  HashAggregate  (cost=41.88..43.88 rows=200 width=4) (actual time=0.014..0.016 rows=1 loops=1)"
"                    Group Key: version.version"
"                    ->  Seq Scan on version  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.009..0.010 rows=1 loops=1)"

第二次查询：

"Limit  (cost=0.28..440.04 rows=10 width=1275) (actual time=0.194..0.233 rows=10 loops=1)"
"  ->  Nested Loop Semi Join  (cost=0.28..1574995.44 rows=35815 width=1275) (actual time=0.193..0.230 rows=10 loops=1)"
"        Join Filter: (shop.version = version.version)"
"        ->  Index Scan using shop_location_idx on shops  (cost=0.28..101549.81 rows=71630 width=1267) (actual time=0.182..0.213 rows=10 loops=1)"
"              Order By: (location <-> '0101000020E6100000E0AEA8D260D4294017EA5509BBC84B40'::geometry)"
"        ->  Materialize  (cost=0.00..48.25 rows=2550 width=4) (actual time=0.001..0.001 rows=1 loops=10)"
"              ->  Seq Scan on version  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.006..0.006 rows=1 loops=1)"

已解决

感谢@JimJones 和@JimMacaulay，请参阅下面的答案

【问题讨论】：

嗨，你能EXPLAIN ANALYZE这两个查询并将其添加到你的问题中吗？它可能会给我们一个线索;-)
其他想法（可能不相关）：两个表中的列version 是否也已编入索引？你试过加入吗？你怎么知道 gist 索引没有被使用？
您在第二个查询中使用了 LIMIT 1，这只会为您提供一条记录。对于这一记录，即使有索引/没有索引也不会产生影响
@JimJones 看着解释，它不做索引扫描；我尝试向版本表添加索引，但没有帮助。
我找到了，多亏了你们……我可能混淆了一些场景，但是在 version 表上放置索引修复了第一个查询以在两列上使用索引。

标签： sql postgresql postgis

【解决方案1】：

在版本表上添加索引有助于 postgres 使用location 列上的索引。

所以添加这个索引是第一个查询的修复：

CREATE INDEX version_idx
    ON public.version USING btree
    (version)
    TABLESPACE pg_default;

在查找内的所有列上正确应用索引，有助于 postgres 制定高效的查询计划。

【讨论】：