如何在 oracle 中找到最接近的子集和答案

【问题标题】：how can I find closest subset sum in oracle如何在 oracle 中找到最接近的子集和
【发布时间】：2020-06-19 16:55:58
【问题描述】：

我有两张桌子： 1）其中之一是发票，有成千上万的数据。在我的 INVOICES 表中，有客户的发票及其价格。 2) 另一个是债务。在我的 DEBTS 表中，有每个客户的发票总债务。
我的目标是找到最接近的金额和债务发票。例如，我有表格：

DEBTS 表：

    CUSTOMER_ID         TOTAL_DEBTS
      3326660                444$      
      2789514                165$     
      4931541                121$

INVOICES 表：

CUSTOMER_ID       INVOICE_ID        AMOUNT_OF_INVOICE
  3326660              1a                   157$ 
  3326660              1b                   112$ 
  3326660              1c                   10$ 
  3326660              1d                   94$ 
  3326660              1e                   47$ 
  3326660              1f                   35$ 
  3326660              1g                   14$ 
  3326660              1h                   132$ 
  3326660              1i                   8$ 
  3326660              1j                   60$ 
  3326660              1k                   42$ 
  2789514              2a                   86$ 
  2789514              2b                   81$
  2789514              2c                   99$
  2789514              2d                   61$
  2789514              2e                   16$
  2789514              2f                   83$
  4931541              3a                   11$
  4931541              3b                   14$
  4931541              3c                   17$
  4931541              3d                   121$
  4931541              3e                   35$
  4931541              3f                   29$

我的目标表是：

CUSTOMER_ID        TOTAL_DEBTS     CALCULATED_AMOUNT        INVOICES_ID   
  3326660              444$              444$              1a,1b,1f,1h,1i    
  2789514              165$              164$                   2b,2f
  4931541              121$              121$                    3d

因为我的表中有成千上万的数据，所以性能对我来说非常重要。我从stackoverflow中找到代码： closest subset sum

但是，性能很低。当我在 calculeted_amount 和 total_debts 之间找到相同的值时，我必须停止加法循环。

感谢您的帮助。

【问题讨论】：

这不是一个适合 SQL 的问题。您需要概括所有可能的组合并查看哪个最接近。
同意@GordonLinoff，阅读这篇文章，它可以帮助你理解你对wiki的问题是什么

标签： sql oracle

【解决方案1】：

使用递归查询：

^demo

with 
    t1 as ( 
        select customer_id cid, total_debts dbt, invoice_id iid, amount_of_invoice amt, 
               row_number() over (partition by customer_id order by invoice_id) rn
          from debts d join invoices i using (customer_id) ),
    t2 (cid, iid, ams, dbt, amt, sma, rn) as ( 
        select cid, cast(iid as varchar2(4000)), cast(amt as varchar2(4000)), 
               dbt, amt, amt, rn
          from t1 
        union all 
        select t2.cid, 
               t2.iid || ', ' || t1.iid,
               t2.ams || ', ' || t1.amt,
               t2.dbt, t2.amt, t1.amt + t2.sma, t1.rn
          from t2 
          join t1 on t1.cid = t2.cid and t1.rn > t2.rn and t2.sma + t1.amt <= t1.dbt),
    t3 as (
        select t2.*, rank() over (partition by cid order by dbt - sma ) rnk
          from t2)
select cid, iid, ams, dbt, sma from t3 where rnk = 1

输出：

    CID  IID                           AMS                             DBT      SMA        
-------  ----------------------------  ------------------------------  -------- -------- 
2789514  2b, 2f                        81, 83                               165      164 
3326660  1a, 1d, 1e, 1g, 1h            157, 94, 47, 14, 132                 444      444 
3326660  1b, 1c, 1d, 1e, 1f, 1g, 1h    112, 10, 94, 47, 35, 14, 132         444      444 
3326660  1a, 1c, 1f, 1h, 1i, 1j, 1k    157, 10, 35, 132, 8, 60, 42          444      444 
3326660  1a, 1b, 1f, 1h, 1i            157, 112, 35, 132, 8                 444      444 
4931541  3d                            121                                  121      121 

6 rows selected

子查询T1 连接两个表并添加列rn 用于合并数据。 T2 是分层的，它完成主要工作 - 合并所有数据，直到总和达到债务。 T3 过滤具有功能rank 的最佳解决方案。如您所见，CID 3326660 有四种可能的最佳组合。

对于大量数据，递归子查询速度很慢，并且此解决方案不起作用，请注意。

【讨论】：