PostgreSQL 中用于背包问题的递归 CTE答案

【问题标题】：Recursive CTE in PostgreSQL for knapsack problemPostgreSQL 中用于背包问题的递归 CTE
【发布时间】：2021-07-29 10:04:25
【问题描述】：

我有一个包含 3 列的数据集：

Item_id	Sourced_from	Cost
1	Local	15
2	Local	10
3	Local	20
4	International	60

我正在尝试在 PostgreSQL 中编写一个查询来获取本地和国际项目的总数，客户可以在现金限额内购买。对于现金限额 50，这是我期望的输出：

Local	International
3	0

我对 PostgreSQL 有相当基本的了解，在谷歌搜索后似乎可以通过递归 CTE 解决这个问题，但我无法弄清楚在这种情况下我应该如何选择我的源种子/锚点。

任何想法，我应该如何处理？

【问题讨论】：

标签： postgresql recursive-cte

【解决方案1】：

不使用递归 CTE，但仍然有效：

DDL/DML：

create table T
(
    id   integer primary key generated by default AS IDENTITY,
    kind text    not null,
    cost integer not null
);

insert into T(kind, cost)
values ('local', 15),
       ('local', 10),
       ('local', 20),
       ('international', 60);

-- 4. This outer CTE and the following self-join is only necessary in order to display the rows that have a count() of 0
with sub as
         (
             -- 3. find the total cost of buying this row + all previous rows, grouped by its kind
             select X.kind, sum(X.cost) as cost, X.rn
             from (
                      with cte as (
                          -- 1. assign an increasing row number on each row from the table ordered by its cost
                          select *, row_number() over (order by T.cost asc, T.kind) as rn
                          from T
                      )
                      -- 2. self-join the CTE on each row with the same kind, but join it only with the rows that have a row number less than or equal to the current row number 
                      select A.id, A.kind, A.cost, B.rn
                      from cte as A
                               join cte as B on A.kind = B.kind and A.rn <= B.rn
                  ) as X
             group by X.kind, X.rn
         )

select M.kind, count(N.*)
from sub as M -- 5. count only the amount of goods that fit in out budget (i.e. 50)
         left outer join sub as N on M.rn = N.rn and N.cost <= 50
group by M.kind
;

输出（db-fiddle）：

+-------------+-----+
|kind         |count|
+-------------+-----+
|local        |3    |
|international|0    |
+-------------+-----+

【讨论】：

【解决方案2】：

我做了一个CTE例子来解决这个问题：

用

重新创建您的案例

create table kp (item_id int, sourced_from varchar, cost int);
insert into kp values (1,'local',15);
insert into kp values (2,'local',10);
insert into kp values (3,'local',20);
insert into kp values (4,'international',60);

以下查询会：

仅从kp 中选择cost 小于50 的项目
在list_of_items 中添加item_id 递归位：
与kp 连接，检查source_from 是否相同且kp.item_id 尚未包含在list_of_items 中（避免多次放置同一项目）
计算总成本 (total_cost)
将新项目item_id 添加到list_of_items

WITH RECURSIVE items (item_id, next_item_id, sourced_from, total_cost, nr_items, list_of_items) AS (
    SELECT 
        item_id, 
        item_id as next_item_id, 
        sourced_from, 
        cost as total_cost,
        1 as nr_items, 
        ARRAY[item_id] list_of_items
  from kp where cost < 50
  UNION ALL
    SELECT 
        kp.item_id, 
        items.item_id  as next_item_id, 
        items.sourced_from, 
        items.total_cost + kp.cost total_cost,
        items.nr_items + 1 as nr_items,
        items.list_of_items || kp.item_id as  list_of_items
    FROM kp join items 
        on items.sourced_from=kp.sourced_from
        and items.list_of_items::int[] @> ARRAY[kp.item_id] = false
    WHERE kp.cost + items.total_cost < 50
)
SELECT * FROM items;

如果您针对上述数据集运行，您最终会得到详细的结果

item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
---------+--------------+--------------+------------+----------+---------------
       1 |            1 | local        |         15 |        1 | {1}
       2 |            2 | local        |         10 |        1 | {2}
       3 |            3 | local        |         20 |        1 | {3}
       1 |            2 | local        |         25 |        2 | {2,1}
       1 |            3 | local        |         35 |        2 | {3,1}
       2 |            1 | local        |         25 |        2 | {1,2}
       2 |            3 | local        |         30 |        2 | {3,2}
       3 |            1 | local        |         35 |        2 | {1,3}
       3 |            2 | local        |         30 |        2 | {2,3}
       1 |            2 | local        |         45 |        3 | {3,2,1}
       1 |            3 | local        |         45 |        3 | {2,3,1}
       2 |            1 | local        |         45 |        3 | {3,1,2}
       2 |            3 | local        |         45 |        3 | {1,3,2}
       3 |            1 | local        |         45 |        3 | {2,1,3}
       3 |            2 | local        |         45 |        3 | {1,2,3}
(15 rows)

它显示了 3 个local 项的所有排列。现在，如果您将最后一个 SELECT 部分替换为

SELECT * FROM items order by nr_items desc, total_cost desc, list_of_items asc limit 1;

您还可以选择项目数量最多且成本最接近预算的组合（我还添加了一个基于list_of_items 的升序排序，以便在多个组合的情况下始终收到相同的结果)，在上述情况下会导致

 item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
---------+--------------+--------------+------------+----------+---------------
       3 |            2 | local        |         45 |        3 | {1,2,3}
(1 row)

如果您只对sourced_from 的最大值感兴趣，那么最后一个SELECT 变为

select sourced_from, max(nr_items) nr_items from items group by sourced_from;

预期结果是

 sourced_from | nr_items 
--------------+----------
 local        |        3
(1 row)

编辑：为了加快查询速度并避免相同对象的多个排列（例如{1,2,3} 和{1,2,3}），我们可以强制下一个item_id 大于当前一。完整查询

WITH RECURSIVE items (item_id, next_item_id, sourced_from, total_cost, nr_items, list_of_items) AS (
    SELECT 
        item_id, 
        item_id as next_item_id, 
        sourced_from, 
        cost as total_cost,
        1 as nr_items, 
        ARRAY[item_id] list_of_items
  from kp where cost < 50
  UNION ALL
    SELECT 
        kp.item_id, 
        items.item_id  as next_item_id, 
        items.sourced_from, 
        items.total_cost + kp.cost total_cost,
        items.nr_items + 1 as nr_items,
        items.list_of_items || kp.item_id as  list_of_items
    FROM kp join items 
        on items.sourced_from=kp.sourced_from
        and items.list_of_items::int[] @> ARRAY[kp.item_id] = false
        and items.item_id < kp.item_id
    WHERE kp.cost + items.total_cost < 50
)
select * from items;

结果

 item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
---------+--------------+--------------+------------+----------+---------------
       1 |            1 | local        |         15 |        1 | {1}
       2 |            2 | local        |         10 |        1 | {2}
       3 |            3 | local        |         20 |        1 | {3}
       2 |            1 | local        |         25 |        2 | {1,2}
       3 |            1 | local        |         35 |        2 | {1,3}
       3 |            2 | local        |         30 |        2 | {2,3}
       3 |            2 | local        |         45 |        3 | {1,2,3}
(7 rows)

【讨论】：