【发布时间】:2015-11-16 08:30:06
【问题描述】:
我有两个等效查询,用于提取特定地区 (ace) 和城市 (pro_com) 中建筑物(表 a)与最近的高速公路(表 v 中的高速公路)之间的平均距离。
这是 CTE 版本
WITH subq AS (
SELECT a.n, a.geom as g1, unnest(ARRAY(SELECT v.geom as g2
FROM atlas_sezioni2 as v
where v.code = '12230' and a.pro_com = v.pro_com and a.code <> v.code
ORDER BY a.geom <-> v.geom LIMIT 15)) as g2
FROM atlas_sezioni2 a
where a.pro_com = 15146 and a.ace = 1 and a.code IN('11100', '11210', '11220', '11230', '11240', '11300', '12100', '14200')
)
select avg(dist) from (
select distinct on(n) n, dist
from (
SELECT n, ST_Distance_Sphere(g1, g2) as dist FROM subq
) disttable
order by n, dist asc
) final;
我在 CTE 中提取 15 条最近的高速公路并计算距离,以便使用 GIST 索引 (http://workshops.boundlessgeo.com/postgis-intro/knn.html)。 CTE 的解释是:
Aggregate (cost=37342.10..37342.11 rows=1 width=8)
CTE subq
-> Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 a (cost=0.29..29987.90 rows=20900 width=236211)
Index Cond: (((code)::text = ANY ('{11100,11210,11220,11230,11240,11300,12100,14200}'::text[])) AND (ace = 1) AND (pro_com = 15146::numeric))
SubPlan 1
-> Limit (cost=141.04..141.08 rows=15 width=236190)
-> Sort (cost=141.04..141.21 rows=69 width=236190)
Sort Key: ((a.geom <-> v.geom))
-> Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 v (cost=0.28..139.35 rows=69 width=236190)
Index Cond: (((code)::text = '12230'::text) AND (a.pro_com = pro_com))
Filter: ((a.code)::text <> (code)::text)
-> Unique (cost=7247.20..7351.70 rows=200 width=72)
-> Sort (cost=7247.20..7299.45 rows=20900 width=72)
Sort Key: subq.n, (_st_distance(geography(subq.g1), geography(subq.g2), 0::double precision, false))
-> CTE Scan on subq (cost=0.00..5747.50 rows=20900 width=72)
(15 rows)
这与子查询等效:
select avg(dist) from (
select distinct on(n) n, dist
from (
SELECT n, ST_Distance_Sphere(g1, g2) as dist FROM (
SELECT a.n, a.geom as g1, unnest(ARRAY(SELECT v.geom as g2
FROM atlas_sezioni2 as v
where v.code = '12230' and a.pro_com = v.pro_com and a.code <> v.code
ORDER BY a.geom <-> v.geom LIMIT 15)) as g2
FROM atlas_sezioni2 a
where a.pro_com = 15146 and a.ace = 1 and a.code IN('11100', '11210', '11220', '11230', '11240', '11300', '12100', '14200')
) subq
) disttable
order by n, dist asc
) final
及其解释
Aggregate (cost=6366298.35..6366298.36 rows=1 width=8)
-> Unique (cost=6365932.60..6366037.10 rows=20900 width=236230)
-> Sort (cost=6365932.60..6365984.85 rows=20900 width=236230)
Sort Key: subq.n, (_st_distance(geography(subq.g1), geography(subq.g2), 0::double precision, false))
-> Subquery Scan on subq (cost=0.29..35526.40 rows=20900 width=236230)
-> Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 a (cost=0.29..29987.90 rows=20900 width=236211)
Index Cond: (((code)::text = ANY ('{11100,11210,11220,11230,11240,11300,12100,14200}'::text[])) AND (ace = 1) AND (pro_com = 15146::numeric))
SubPlan 1
-> Limit (cost=141.04..141.08 rows=15 width=236190)
-> Sort (cost=141.04..141.21 rows=69 width=236190)
Sort Key: ((a.geom <-> v.geom))
-> Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 v (cost=0.28..139.35 rows=69 width=236190)
Index Cond: (((code)::text = '12230'::text) AND (a.pro_com = pro_com))
Filter: ((a.code)::text <> (code)::text)
(14 rows)
我知道 CTE 是优化的边界围栏(Postgres 不会在 CTE 和它们之外的查询之间进行优化),但这很奇怪。为什么性能会这样爆?
【问题讨论】:
-
我想你刚刚在最后一段中回答了这个问题。
-
@CraigRinger mhh no... CTE 即使没有优化也有更好的性能:D
-
我注意到了同样的事情。我编写的大多数复杂查询在编写为 CTE 时比作为子查询执行得好多。 CTE 通常也更容易阅读,这是一个令人高兴的巧合。但这总是让我有点困惑,因为你说的确切原因:似乎他们应该更慢。
-
那么......您是说在这种情况下基于 CTE 的表单性能更好,而这正是您想知道的?您没有包含
ANALYZE、时间信息等。如果 CTE 表单比子查询表单更快,假设 CTE 和子查询之间进行精确转换,那么规划器在这种情况下做出的优化决策很差。EXPLAIN (buffers, analyze)细节可能会有所帮助。 -
我还是不明白问题是什么。您发布了两个计划(没有时间安排)并问...什么?
标签: postgresql postgis common-table-expression