PostgreSQL 将数据从递归 CTE 传递到函数答案

【问题标题】：PostgreSQL pass data from recursive CTE onto functionPostgreSQL 将数据从递归 CTE 传递到函数
【发布时间】：2012-05-10 16:15:00
【问题描述】：

我有以下问题：我正在尝试发现从源节点 (node_s) 到目标节点 (node_t) 的所有可能路径。

带有图边的原始表格的格式很简单：|节点_x |节点_y | strength | ，其中 "node_x" -> "node_y" 是直接边，边的强度为 "weight"。

我们的想法是，如果在探索路径期间的任何时候，我们发现其子节点中的一个节点具有目标 node_t，我们记录该路径并停止从该节点探索路径，否则继续探索.

简单的解决方案是使用 PostgreSQL 的递归 CTE 构造图的传递闭包：

  WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
  ( 
   --non-recurive term
   SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
   FROM best_path b
   WHERE b.node_x = node_s --source_node

   UNION

   --recurive term
   SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
   FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
   WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
  )
  SELECT * 
  FROM transitive_closure tc
  WHERE tc.target = node_t --target_node

上面的代码将从源节点node_s发现所有可能的路径。只有在传递闭包构造之后，我才能选择从源节点到目标节点所需的路径行（参见最后一个 SELECT 语句）。

例子：

best_path 表有以下数据：

 node_x | node_y | strength 
--------+--------+----------
      1 |      2 |        1
      1 |      3 |        1
      2 |      4 |        1
      2 |      5 |        1
      3 |      6 |        1
      3 |      7 |        1
      4 |      8 |        1
      4 |      9 |        1
      5 |     10 |        1
      5 |     11 |        1

查询：

找到从源节点 = 1 到目标节点 = 4 的路径

结果：

 source | target | strength | path_string
--------+--------+----------+------------
      1 |      2 |        1 | 1.2.
      1 |      3 |        1 | 1.3.
      1 |      4 |        1 | 1.2.4.
      1 |      5 |        1 | 1.2.5.
      1 |      6 |        1 | 1.3.6.
      1 |      7 |        1 | 1.3.7.
      1 |      8 |        1 | 1.2.4.8.
      1 |      9 |        1 | 1.2.4.9.
      1 |     10 |        1 | 1.2.5.10.
      1 |     11 |        1 | 1.2.5.11.

这不是我需要的。由于已经有从节点 2 到节点 4（目标）的直接边，我不需要路径 1.2.5.、1.2.4.8.、1.2.4.9.、1.2.5.10.、1.2.5.11.，路径探索对于节点 2，应该在发现从 2 到 4 的路径时停止。

总而言之，如果节点已经有到目标节点的直接边，我不想发现它的路径。这意味着在 CTE 的递归术语中，我希望有一些条件会说如下，伪代码如下：

  WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
  ( 
   --non-recurive term (as before)
   SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
   FROM best_path b
   WHERE b.node_x = node_s --source_node

   UNION

   --recurive term
   SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
   FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
   WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
     AND b.node_y = node_t 
     --will select only rows with direct edge to target

   UNION (second union is not allowed in CTE)

   SELECT those rows which do not have direct edge to target 
   AND which parents did not contribute to constructing the query above. 
   i.e. b.node_x = tc.target where there is no direct edge between b.node_x to node_t
  )

作为查找从源节点 = 1 到目标节点 = 4 的路径的查询结果，我想要以下内容：

 source | target | strength | path_string
--------+--------+----------+------------
      1 |      2 |        1 | 1.2.
      1 |      3 |        1 | 1.3.
      1 |      4 |        1 | 1.2.4.
      1 |      6 |        1 | 1.3.6.
      1 |      7 |        1 | 1.3.7.

提前感谢您的帮助！

我已经尝试了很多方法：例如FROM/WHERE 子句中的条件，试图将 CTE 传递给函数，但没有成功。

任何建议将不胜感激。

我有自己的递归函数，可以实现我想要的，但是，它在大量数据上非常慢；并且 PostgreSQL 的 CTE 显然优化得很好，所以我想深入研究一下。

【问题讨论】：

如果 Source = 2 和 Target = 5，结果是什么？请将其附加到您的问题中。我正在尝试完善这个查询：stackoverflow.com/a/10395130
第一个CTE会输出不想要的结果：2->4 (2.4.), 2->5 (2.5.), 2->8 (2.4.8.), 2->9 (2.4.9.)、2->10 (2.5.10.)、2->11 (2.5.11.)。期望的输出是：2->5 (2.5.)
我已经修改了我的答案stackoverflow.com/a/10395130 现在只产生 2->5 个
@MichaelBuen 谢谢您，目前正在测试，将返回测试结果！
另一种查询优化方法，虽然不能移植到其他数据库，但仅适用于 Postgresql，它使用 DISTINCT ON：stackoverflow.com/a/10401572

标签： postgresql graph transitive-closure-table recursive-cte

【解决方案1】：

优化的解决方案，最终结果不再有 WHERE 子句；尽管是 Postgresql 特定的解决方案，即我们使用 DISTINCT ON 来选择特定的行：

鉴于此数据：

CREATE TABLE best_path
    (node_x int, node_y int, strength int);

INSERT INTO best_path
    (node_x, node_y, strength)
VALUES
    (1, 2, 1),
    (1, 3, 1),
    (2, 4, 1),
    (2, 5, 1),
    (3, 6, 1),
    (3, 7, 1),
    (4, 8, 1),
    (4, 9, 1),
    (5, 10, 1),
    (5, 11, 1),
    (5, 12, 1);

查询，第一阶段，幕后展示（源 = 1，目标 = 11）：

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 11 as d_target )
,traverse_path(filter, source, target, path, bingo, distincter) as
(
  select                      
      bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y      
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool            
      ,coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)                           
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select 
      tp.filter, bp.node_x, bp.node_y, path || '.' || node_y       
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool        
      ,coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)                       
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
)    
select tp.*
from traverse_path tp

源 = 1，目标 = 11 的输出：http://www.sqlfiddle.com/#!1/db290/56

FILTER   SOURCE   TARGET   PATH     BINGO    DISTINCTER
1        1        2        1.2      0        2
1        1        3        1.3      0        3
1        2        4        1.2.4    0        4
1        2        5        1.2.5    0        5
1        3        6        1.3.6    0        6
1        3        7        1.3.7    0        7
1        4        8        1.2.4.8  0        8
1        4        9        1.2.4.9  0        9
1        5        11       1.2.5.11 1        1
1        5        10       1.2.5.10 1        1
1        5        12       1.2.5.12 1        1

正如我们所见，目标 11 是源 5 中的第一行。这是怎么发生的？在我们的 DISTINCTER 列中，我们在 MAX 上使用 ORDER BY，这是对 MAX 窗口进行 ORDER 有意义的罕见情况之一。我们只是使用它来对结果进行排序。我尝试将 ORDER BY 放在查询的末尾，但数据库抱怨它不能在 CTE 上使用 ORDER。 CTE 禁止放置 ORDER BY 子句。但是众所周知，我们可以影响OVER() 子句的排序，所以这就是我们的结果如何排序。在上面的结果中，在 5 的来源中，数字 11 排在 10 和 12 之前，因为 11 是我们的目标。因此，当我们执行DISTINCT ON（Postgresql 特有的功能）时，Postgres 将拾取第一行，因此它将拾取目标 11。

这是我们的最终查询，经过优化，虽然是针对 Postgresql 的：

with recursive -- this denotes that at least one CTE is using recursion    
inputs as ( select 1 as d_source, 11 as d_target )
,traverse_path(filter, source, target, path, bingo) as (
  select 
      distinct on (
        bp.node_x,            
        coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)
      )          
      bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y      
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter
  union
  select       
      distinct on (
        bp.node_x,            
        coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)
      )    
      tp.filter, bp.node_x, bp.node_y, path || '.' || node_y       
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
) select tp.* from traverse_path tp

源的输出 = 1，目标 = 11 http://www.sqlfiddle.com/#!1/db290/55

FILTER   SOURCE   TARGET   PATH     BINGO
1        1        2        1.2      0
1        1        3        1.3      0
1        2        4        1.2.4    0
1        2        5        1.2.5    0
1        3        6        1.3.6    0
1        3        7        1.3.7    0
1        4        8        1.2.4.8  0
1        4        9        1.2.4.9  0
1        5        11       1.2.5.11 1

源 = 1，目标 = 4 的输出：http://www.sqlfiddle.com/#!1/db290/53

FILTER  SOURCE  TARGET  PATH    BINGO
1       1       2       1.2     0
1       1       3       1.3     0
1       2       4       1.2.4   1
1       3       6       1.3.6   0
1       3       7       1.3.7   0

源 = 2，目标 = 5 的输出：http://www.sqlfiddle.com/#!1/db290/54

FILTER  SOURCE  TARGET  PATH    BINGO
2       2       5       2.5     1

另一种方法，而不是 BINGO 方法。使用 continue_search 作为继续遍历的逻辑。并使用 EVERY 聚合函数来确定我们是否需要继续遍历图。

它的工作原理如下：http://www.sqlfiddle.com/#!1/db290/84

最终查询：http://www.sqlfiddle.com/#!1/db290/85

有趣的是，每一个都非常像英语：

对比一下：

,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool

使用 EVERY ：

,every(bp.node_y <> i.d_target) over(partition by bp.node_x)

哪个更容易阅读？

这里的输出说明了使用 EVERY 来促进 DISTINCT ON 的原理：

Source = 1，Target = 5。请注意，5 排在属于同一来源的其他数字之前，稍后将由 DISTINCT ON 选择。

FILTER    SOURCE    TARGET    PATH      CONTINUE_SEARCH     DISTINCTER
1         1         2         1.2       1                   2
1         1         3         1.3       1                   3
1         2         5         1.2.5     0                   0
1         2         4         1.2.4     0                   0
1         3         6         1.3.6     1                   6
1         3         7         1.3.7     1                   7

这是执行该原则的查询：

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 5 as d_target )
,traverse_path(filter, source, target, path, continue_search, distincter) as
(
  select                      
      bp.node_x, bp.node_x,  bp.node_y, concat(bp.node_x , '.' , bp.node_y )
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
      ,coalesce(        
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                           
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select 
      tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)    
      ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
     and tp.continue_search
)    
select tp.*
from traverse_path tp

最终查询：

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 5 as d_target )
,traverse_path(filter, source, target, path, continue_search) as
(
  select                      
      distinct on
      (
        bp.node_x
        ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
      )    
      bp.node_x, bp.node_x,  bp.node_y, concat(bp.node_x , '.' , bp.node_y )
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select   
      distinct on
      (
        bp.node_x
        ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
      )  
      tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
     and tp.continue_search
)    
select tp.*
from traverse_path tp

输出：

FILTER    SOURCE    TARGET    PATH      CONTINUE_SEARCH
1         1         2         1.2       1
1         1         3         1.3       1
1         2         5         1.2.5     0
1         3         6         1.3.6     1
1         3         7         1.3.7     1

【讨论】：

【解决方案2】：

测试用临时表：

CREATE TEMP TABLE best_path (node_x int, node_y int, strength int);
INSERT INTO best_path  VALUES
 (1, 2,  1)
,(1, 3,  1)
,(2, 4,  1)
,(2, 5,  1)
,(3, 6,  1)
,(3, 7,  1)
,(4, 8,  1)
,(4, 9,  1)
,(5, 10, 1)
,(5, 11, 1);

查询：
_{已修改以容纳关于 2 - 5 的评论}

WITH RECURSIVE t AS (    -- for readability and convenience:
    SELECT 1 AS node_s   -- source_node
         , 4 AS node_t   -- target_node
    )
    , x AS (
    SELECT node_x
    FROM   t, best_path
    WHERE  node_y = node_t
    )
    , a AS  ( 
    SELECT b.node_x
         , b.node_y
         , b.strength AS weight
         , b.node_x || '.' || b.node_y || '.' AS path
    FROM   t, best_path b
    LEFT   JOIN x ON x.node_x = b.node_x
    WHERE  b.node_x = node_s
    AND    (x.node_x IS NULL                    -- no point with edge to target
            OR b.node_y = node_t)               -- except with target_node

    UNION ALL
    SELECT a.node_x
         , b.node_y
         , least(a.weight, b.strength)
         , a.path || b.node_y || '.' AS path
    FROM   t, a
    JOIN   best_path AS b ON b.node_x = a.node_y
    LEFT   JOIN x ON x.node_x = b.node_x
    WHERE  a.node_y <> node_t                   -- arrived at target
    AND    a.path !~~ ('%' || b.node_y || '.%') -- not visited yet
    AND    (x.node_x IS NULL                    -- no point with edge to target
            OR b.node_y = node_t)               -- except with target_node
    )
TABLE a;

准确地产生请求的结果。

我也在初始查询中添加了中断条件，因为我们只需一步即可到达目标。

这恰好很像my answer to a previous, similar question。所有解释和链接都适用。这里主要的附加技巧是包含一个额外的 CTE x 来收集节点...

有（a）直接边缘到目标。

用于在递归 CTE 的中断条件中重复使用。众所周知，无论关键字 RECURSIVE 是什么，您都可以在递归 CTE 之上添加其他（非递归）CTE。

【讨论】：

谢谢你，Erwin，现在做测试！稍后将对结果发表评论！

【解决方案3】：

在重新阅读 OP 的问题后，我想出了这个解决方案：

来源

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 4 as d_target )
,traverse_path(filter, source, target, path, bingo) as
(
  select bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y,

    max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  

  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter

  union

  select tp.filter, bp.node_x, bp.node_y, path || '.' || node_y,   

      max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool   

  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
)    
select tp.*
from traverse_path tp cross join inputs i    
-- if Postgresql has Oracle KEEP windowing, 
-- we don't need to use WHERE clause here     
where 
  not tp.bingo   
  or
  (
    tp.bingo and tp.target = d_target
  )

上面的 WHERE 子句可以缩短为：

  WHERE not bingo or target = 4

输出：

FILTER  SOURCE  TARGET  PATH    BINGO
1       1       2       1.2     0
1       1       3       1.3     0
1       2       4       1.2.4   1
1       3       6       1.3.6   0
1       3       7       1.3.7   0

现场测试：http://www.sqlfiddle.com/#!1/cdde6/55

这是 Source = 2, Target = 5 的结果：

FILTER  SOURCE  TARGET  PATH    BINGO
2       2       5       2.5     1

数据：

CREATE TABLE best_path
    (node_x int, node_y int, strength int);

INSERT INTO best_path
    (node_x, node_y, strength)
VALUES
    (1, 2, 1),
    (1, 3, 1),
    (2, 4, 1),
    (2, 5, 1),
    (3, 6, 1),
    (3, 7, 1),
    (4, 8, 1),
    (4, 9, 1),
    (5, 10, 1),
    (5, 11, 1);

【讨论】：

【解决方案4】：

如果从底部开始，您可以更有效地搜索路径。从孩子开始。如果从父节点开始，则需要遍历所有子节点；而如果你从孩子那里搜索，它只有一个父母，因此不会浪费时间寻找源和目标之间的路径。

with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9

    union all

    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source
),
construct_path(source, target, recentness, path) as
(
  select source, target, recentness, source || '.' || target
  from find_parent 
  where recentness = (select max(recentness) from find_parent)

  union

  select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
  from find_parent dd
  join construct_path cp on dd.recentness = cp.recentness - 1  
)
select source, target, path 
from construct_path
order by recentness desc

输出：

SOURCE   TARGET   PATH
1        2        1.2
2        4        1.2.4
4        9        1.2.4.9

现场测试：http://www.sqlfiddle.com/#!1/13e6b/1

与此类似：How to get the parent given a child in SQL SERVER 2005

这是优化的，如果它已经找到特定的（源），则将递归削减到父级。

来源 = 2

目标 = 9

with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9

    union all

    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source 
         -- despite the name, this target is another one's source
         and i.target <> 2
)
,construct_path(source, target, recentness, path) as
(
    select source, target, recentness, source || '.' || target
    from find_parent 
    where recentness = (select max(recentness) from find_parent)

    union

    select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
    from find_parent dd
    join construct_path cp on dd.recentness = cp.recentness - 1  

)
select source, target, path
from construct_path
order by recentness desc

输出：

SOURCE   TARGET  PATH
2        4       2.4
4        9       2.4.9

现场测试：http://www.sqlfiddle.com/#!1/13e6b/16

【讨论】：

您不认为在最终查询的WHERE 子句中使用限制条件而不是将它们推送到WITH 源中会更好吗？
最好把限制条件放在根源上，效率高。如果将限制条件放在最终查询中，最终查询将不得不处理几乎 seq scan'd 的结果，它会很慢
是什么让你假设一个层次结构？ OP 写了关于source 和target。
不是分层的，我直接用了OP的数据