Postgres 的 CTE 与子查询性能差异。为什么？答案

【问题标题】：Postgres' CTE vs Subquery Performance difference. Why?Postgres 的 CTE 与子查询性能差异。为什么？
【发布时间】：2015-11-16 08:30:06
【问题描述】：

我有两个等效查询，用于提取特定地区 (ace) 和城市 (pro_com) 中建筑物（表 a）与最近的高速公路（表 v 中的高速公路）之间的平均距离。

这是 CTE 版本

WITH subq AS (
SELECT a.n, a.geom as g1, unnest(ARRAY(SELECT v.geom as g2 
  FROM atlas_sezioni2 as v
  where v.code = '12230' and a.pro_com = v.pro_com and a.code <> v.code  
  ORDER BY a.geom <-> v.geom LIMIT 15)) as g2
FROM atlas_sezioni2 a
where a.pro_com = 15146 and a.ace = 1 and a.code IN('11100', '11210', '11220', '11230', '11240', '11300', '12100', '14200')  
)

select avg(dist) from (
select distinct on(n) n, dist
from (
SELECT n, ST_Distance_Sphere(g1, g2) as dist FROM subq
) disttable
order by n, dist asc
) final;

我在 CTE 中提取 15 条最近的高速公路并计算距离，以便使用 GIST 索引 (http://workshops.boundlessgeo.com/postgis-intro/knn.html)。 CTE 的解释是：

Aggregate  (cost=37342.10..37342.11 rows=1 width=8)
   CTE subq
     ->  Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 a  (cost=0.29..29987.90 rows=20900 width=236211)
       Index Cond: (((code)::text = ANY ('{11100,11210,11220,11230,11240,11300,12100,14200}'::text[])) AND (ace = 1) AND (pro_com = 15146::numeric))
       SubPlan 1
         ->  Limit  (cost=141.04..141.08 rows=15 width=236190)
               ->  Sort  (cost=141.04..141.21 rows=69 width=236190)
                     Sort Key: ((a.geom <-> v.geom))
                     ->  Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 v  (cost=0.28..139.35 rows=69 width=236190)
                           Index Cond: (((code)::text = '12230'::text) AND (a.pro_com = pro_com))
                           Filter: ((a.code)::text <> (code)::text)
   ->  Unique  (cost=7247.20..7351.70 rows=200 width=72)
     ->  Sort  (cost=7247.20..7299.45 rows=20900 width=72)
           Sort Key: subq.n, (_st_distance(geography(subq.g1), geography(subq.g2), 0::double precision, false))
           ->  CTE Scan on subq  (cost=0.00..5747.50 rows=20900 width=72)
(15 rows)

这与子查询等效：

select avg(dist) from (
select distinct on(n) n, dist
from (
SELECT n, ST_Distance_Sphere(g1, g2) as dist FROM (
SELECT a.n, a.geom as g1, unnest(ARRAY(SELECT v.geom as g2 
  FROM atlas_sezioni2 as v
  where v.code = '12230' and a.pro_com = v.pro_com and a.code <> v.code  
  ORDER BY a.geom <-> v.geom LIMIT 15)) as g2
FROM atlas_sezioni2 a
where a.pro_com = 15146 and a.ace = 1 and a.code IN('11100', '11210', '11220', '11230', '11240', '11300', '12100', '14200')  
) subq
) disttable
order by n, dist asc
) final

及其解释

Aggregate  (cost=6366298.35..6366298.36 rows=1 width=8)
   ->  Unique  (cost=6365932.60..6366037.10 rows=20900 width=236230)
     ->  Sort  (cost=6365932.60..6365984.85 rows=20900 width=236230)
           Sort Key: subq.n, (_st_distance(geography(subq.g1), geography(subq.g2), 0::double precision, false))
           ->  Subquery Scan on subq  (cost=0.29..35526.40 rows=20900 width=236230)
                 ->  Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 a  (cost=0.29..29987.90 rows=20900 width=236211)
                       Index Cond: (((code)::text = ANY ('{11100,11210,11220,11230,11240,11300,12100,14200}'::text[])) AND (ace = 1) AND (pro_com = 15146::numeric))
                       SubPlan 1
                         ->  Limit  (cost=141.04..141.08 rows=15 width=236190)
                               ->  Sort  (cost=141.04..141.21 rows=69 width=236190)
                                     Sort Key: ((a.geom <-> v.geom))
                                     ->  Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 v  (cost=0.28..139.35 rows=69 width=236190)
                                           Index Cond: (((code)::text = '12230'::text) AND (a.pro_com = pro_com))
                                           Filter: ((a.code)::text <> (code)::text)
(14 rows)

我知道 CTE 是优化的边界围栏（Postgres 不会在 CTE 和它们之外的查询之间进行优化），但这很奇怪。为什么性能会这样爆？

【问题讨论】：

我想你刚刚在最后一段中回答了这个问题。
@CraigRinger mhh no... CTE 即使没有优化也有更好的性能：D
我注意到了同样的事情。我编写的大多数复杂查询在编写为 CTE 时比作为子查询执行得好多。 CTE 通常也更容易阅读，这是一个令人高兴的巧合。但这总是让我有点困惑，因为你说的确切原因：似乎他们应该更慢。
那么......您是说在这种情况下基于 CTE 的表单性能更好，而这正是您想知道的？您没有包含ANALYZE、时间信息等。如果 CTE 表单比子查询表单更快，假设 CTE 和子查询之间进行精确转换，那么规划器在这种情况下做出的优化决策很差。 EXPLAIN (buffers, analyze) 细节可能会有所帮助。
我还是不明白问题是什么。您发布了两个计划（没有时间安排）并问...什么？

标签： postgresql postgis common-table-expression

【解决方案1】：

正如@CraigRinger 所说，我也应该检查分析。事实上，从“解释分析”我们看到第一个是：

Aggregate  (cost=58406.66..58406.67 rows=1 width=8) (actual time=138191.294..138191.295 rows=1 loops=1)
 CTE subq
   ->  Bitmap Heap Scan on atlas_sezioni2 a  (cost=9.93..51052.46 rows=20900 width=236211) (actual time=2.814..308.667 rows=3705 loops=1)
         Recheck Cond: (ace = 1)
         Filter: ((pro_com = 15146::numeric) AND ((code)::text = ANY ('{11100,11210,11220,11230,11240,11300,12100,14200}'::text[])))
         Rows Removed by Filter: 4
         Heap Blocks: exact=42
         ->  Bitmap Index Scan on atlas_sezioni2_ace_idx  (cost=0.00..9.88 rows=251 width=0) (actual time=0.110..0.110 rows=251 loops=1)
               Index Cond: (ace = 1)
         SubPlan 1
           ->  Limit  (cost=240.70..240.74 rows=15 width=236190) (actual time=0.630..0.636 rows=15 loops=247)
                 ->  Sort  (cost=240.70..240.87 rows=69 width=236190) (actual time=0.627..0.630 rows=15 loops=247)
                       Sort Key: ((a.geom <-> v.geom))
                       Sort Method: top-N heapsort  Memory: 26kB
                       ->  Bitmap Heap Scan on atlas_sezioni2 v  (cost=4.56..239.01 rows=69 width=236190) (actual time=0.045..0.518 rows=73 loops=247)
                             Recheck Cond: ((code)::text = '12230'::text)
                             Filter: (((a.code)::text <> (code)::text) AND (a.pro_com = pro_com))
                             Heap Blocks: exact=6916
                             ->  Bitmap Index Scan on atlas_sezioni2_code_idx  (cost=0.00..4.55 rows=73 width=0) (actual time=0.030..0.030 rows=73 loops=247)
                                   Index Cond: ((code)::text = '12230'::text)
 ->  Unique  (cost=7247.20..7351.70 rows=200 width=72) (actual time=138190.527..138191.243 rows=247 loops=1)
       ->  Sort  (cost=7247.20..7299.45 rows=20900 width=72) (actual time=138190.526..138190.800 rows=3705 loops=1)
             Sort Key: subq.n, (_st_distance(geography(subq.g1), geography(subq.g2), 0::double precision, false))
             Sort Method: quicksort  Memory: 270kB
             ->  CTE Scan on subq  (cost=0.00..5747.50 rows=20900 width=72) (actual time=159.739..138182.891 rows=3705 loops=1)
 Planning time: 2.623 ms
 Execution time: 138217.574 ms
(27 rows)

而子查询是：

Aggregate  (cost=6387362.91..6387362.92 rows=1 width=8) (actual time=140208.005..140208.005 rows=1 loops=1)
 ->  Unique  (cost=6386997.16..6387101.66 rows=20900 width=236230) (actual time=140207.243..140207.947 rows=247 loops=1)
       ->  Sort  (cost=6386997.16..6387049.41 rows=20900 width=236230) (actual time=140207.241..140207.514 rows=3705 loops=1)
             Sort Key: subq.n, (_st_distance(geography(subq.g1), geography(subq.g2), 0::double precision, false))
             Sort Method: quicksort  Memory: 270kB
             ->  Subquery Scan on subq  (cost=9.93..56590.96 rows=20900 width=236230) (actual time=160.784..140199.364 rows=3705 loops=1)
                   ->  Bitmap Heap Scan on atlas_sezioni2 a  (cost=9.93..51052.46 rows=20900 width=236211) (actual time=2.384..308.517 rows=3705 loops=1)
                         Recheck Cond: (ace = 1)
                         Filter: ((pro_com = 15146::numeric) AND ((code)::text = ANY ('{11100,11210,11220,11230,11240,11300,12100,14200}'::text[])))
                         Rows Removed by Filter: 4
                         Heap Blocks: exact=42
                         ->  Bitmap Index Scan on atlas_sezioni2_ace_idx  (cost=0.00..9.88 rows=251 width=0) (actual time=0.150..0.150 rows=251 loops=1)
                               Index Cond: (ace = 1)
                         SubPlan 1
                           ->  Limit  (cost=240.70..240.74 rows=15 width=236190) (actual time=0.640..0.646 rows=15 loops=247)
                                 ->  Sort  (cost=240.70..240.87 rows=69 width=236190) (actual time=0.637..0.640 rows=15 loops=247)
                                       Sort Key: ((a.geom <-> v.geom))
                                       Sort Method: top-N heapsort  Memory: 26kB
                                       ->  Bitmap Heap Scan on atlas_sezioni2 v  (cost=4.56..239.01 rows=69 width=236190) (actual time=0.045..0.527 rows=73 loops=247)
                                             Recheck Cond: ((code)::text = '12230'::text)
                                             Filter: (((a.code)::text <> (code)::text) AND (a.pro_com = pro_com))
                                             Heap Blocks: exact=6916
                                             ->  Bitmap Index Scan on atlas_sezioni2_code_idx  (cost=0.00..4.55 rows=73 width=0) (actual time=0.031..0.031 rows=73 loops=247)
                                                   Index Cond: ((code)::text = '12230'::text)
 Planning time: 1.117 ms
 Execution time: 140208.187 ms

所以它只在解释中表现得更好:)。实际性能不会改变。

【讨论】：