【问题标题】:Recursive CTE in PostgreSQL for knapsack problemPostgreSQL 中用于背包问题的递归 CTE
【发布时间】:2021-07-29 10:04:25
【问题描述】:

我有一个包含 3 列的数据集:

Item_id Sourced_from Cost
1 Local 15
2 Local 10
3 Local 20
4 International 60

我正在尝试在 PostgreSQL 中编写一个查询来获取本地和国际项目的总数,客户可以在现金限额内购买。对于现金限额 50,这是我期望的输出:

Local International
3 0

我对 PostgreSQL 有相当基本的了解,在谷歌搜索后似乎可以通过递归 CTE 解决这个问题,但我无法弄清楚在这种情况下我应该如何选择我的源种子/锚点。

任何想法,我应该如何处理?

【问题讨论】:

    标签: postgresql recursive-cte


    【解决方案1】:

    不使用递归 CTE,但仍然有效:

    DDL/DML:

    create table T
    (
        id   integer primary key generated by default AS IDENTITY,
        kind text    not null,
        cost integer not null
    );
    
    insert into T(kind, cost)
    values ('local', 15),
           ('local', 10),
           ('local', 20),
           ('international', 60);
    
    -- 4. This outer CTE and the following self-join is only necessary in order to display the rows that have a count() of 0
    with sub as
             (
                 -- 3. find the total cost of buying this row + all previous rows, grouped by its kind
                 select X.kind, sum(X.cost) as cost, X.rn
                 from (
                          with cte as (
                              -- 1. assign an increasing row number on each row from the table ordered by its cost
                              select *, row_number() over (order by T.cost asc, T.kind) as rn
                              from T
                          )
                          -- 2. self-join the CTE on each row with the same kind, but join it only with the rows that have a row number less than or equal to the current row number 
                          select A.id, A.kind, A.cost, B.rn
                          from cte as A
                                   join cte as B on A.kind = B.kind and A.rn <= B.rn
                      ) as X
                 group by X.kind, X.rn
             )
    
    select M.kind, count(N.*)
    from sub as M -- 5. count only the amount of goods that fit in out budget (i.e. 50)
             left outer join sub as N on M.rn = N.rn and N.cost <= 50
    group by M.kind
    ;
    

    输出(db-fiddle):

    +-------------+-----+
    |kind         |count|
    +-------------+-----+
    |local        |3    |
    |international|0    |
    +-------------+-----+
    

    【讨论】:

      【解决方案2】:

      我做了一个CTE例子来解决这个问题:

      重新创建您的案例
      create table kp (item_id int, sourced_from varchar, cost int);
      insert into kp values (1,'local',15);
      insert into kp values (2,'local',10);
      insert into kp values (3,'local',20);
      insert into kp values (4,'international',60);
      
      

      以下查询会:

      • 仅从kp 中选择cost 小于50 的项目
      • list_of_items 中添加item_id 递归位:
      • kp 连接,检查source_from 是否相同且kp.item_id 尚未包含在list_of_items 中(避免多次放置同一项目)
      • 计算总成本 (total_cost)
      • 将新项目item_id 添加到list_of_items
      WITH RECURSIVE items (item_id, next_item_id, sourced_from, total_cost, nr_items, list_of_items) AS (
          SELECT 
              item_id, 
              item_id as next_item_id, 
              sourced_from, 
              cost as total_cost,
              1 as nr_items, 
              ARRAY[item_id] list_of_items
        from kp where cost < 50
        UNION ALL
          SELECT 
              kp.item_id, 
              items.item_id  as next_item_id, 
              items.sourced_from, 
              items.total_cost + kp.cost total_cost,
              items.nr_items + 1 as nr_items,
              items.list_of_items || kp.item_id as  list_of_items
          FROM kp join items 
              on items.sourced_from=kp.sourced_from
              and items.list_of_items::int[] @> ARRAY[kp.item_id] = false
          WHERE kp.cost + items.total_cost < 50
      )
      SELECT * FROM items;
      

      如果您针对上述数据集运行,您最终会得到详细的结果

      item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
      ---------+--------------+--------------+------------+----------+---------------
             1 |            1 | local        |         15 |        1 | {1}
             2 |            2 | local        |         10 |        1 | {2}
             3 |            3 | local        |         20 |        1 | {3}
             1 |            2 | local        |         25 |        2 | {2,1}
             1 |            3 | local        |         35 |        2 | {3,1}
             2 |            1 | local        |         25 |        2 | {1,2}
             2 |            3 | local        |         30 |        2 | {3,2}
             3 |            1 | local        |         35 |        2 | {1,3}
             3 |            2 | local        |         30 |        2 | {2,3}
             1 |            2 | local        |         45 |        3 | {3,2,1}
             1 |            3 | local        |         45 |        3 | {2,3,1}
             2 |            1 | local        |         45 |        3 | {3,1,2}
             2 |            3 | local        |         45 |        3 | {1,3,2}
             3 |            1 | local        |         45 |        3 | {2,1,3}
             3 |            2 | local        |         45 |        3 | {1,2,3}
      (15 rows)
      

      它显示了 3 个local 项的所有排列。 现在,如果您将最后一个 SELECT 部分替换为

      SELECT * FROM items order by nr_items desc, total_cost desc, list_of_items asc limit 1;
      

      您还可以选择项目数量最多且成本最接近预算的组合(我还添加了一个基于list_of_items 的升序排序,以便在多个组合的情况下始终收到相同的结果),在上述情况下会导致

       item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
      ---------+--------------+--------------+------------+----------+---------------
             3 |            2 | local        |         45 |        3 | {1,2,3}
      (1 row)
      

      如果您只对sourced_from 的最大值感兴趣,那么最后一个SELECT 变为

      select sourced_from, max(nr_items) nr_items from items group by sourced_from;
      

      预期结果是

       sourced_from | nr_items 
      --------------+----------
       local        |        3
      (1 row)
      

      编辑:为了加快查询速度并避免相同对象的多个排列(例如{1,2,3}{1,2,3}),我们可以强制下一个item_id 大于当前一。完整查询

      WITH RECURSIVE items (item_id, next_item_id, sourced_from, total_cost, nr_items, list_of_items) AS (
          SELECT 
              item_id, 
              item_id as next_item_id, 
              sourced_from, 
              cost as total_cost,
              1 as nr_items, 
              ARRAY[item_id] list_of_items
        from kp where cost < 50
        UNION ALL
          SELECT 
              kp.item_id, 
              items.item_id  as next_item_id, 
              items.sourced_from, 
              items.total_cost + kp.cost total_cost,
              items.nr_items + 1 as nr_items,
              items.list_of_items || kp.item_id as  list_of_items
          FROM kp join items 
              on items.sourced_from=kp.sourced_from
              and items.list_of_items::int[] @> ARRAY[kp.item_id] = false
              and items.item_id < kp.item_id
          WHERE kp.cost + items.total_cost < 50
      )
      select * from items;
      

      结果

       item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
      ---------+--------------+--------------+------------+----------+---------------
             1 |            1 | local        |         15 |        1 | {1}
             2 |            2 | local        |         10 |        1 | {2}
             3 |            3 | local        |         20 |        1 | {3}
             2 |            1 | local        |         25 |        2 | {1,2}
             3 |            1 | local        |         35 |        2 | {1,3}
             3 |            2 | local        |         30 |        2 | {2,3}
             3 |            2 | local        |         45 |        3 | {1,2,3}
      (7 rows)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2023-04-10
        • 1970-01-01
        • 1970-01-01
        • 2019-03-31
        • 1970-01-01
        • 2020-10-07
        相关资源
        最近更新 更多