【问题标题】:Removing Duplicates only when criteria's are met仅在满足条件时删除重复项
【发布时间】:2021-08-28 17:52:06
【问题描述】:

关于重复的最后一个问题。我了解如何在 HAVING 子句 > 1 的情况下使用 COUNT(*) 来选择重复记录,但我面临着在满足条件时删除重复记录的挑战。

我昨天在账单金额取消时删除重复项时询问了其中的一部分,但现在我必须在其中包含一个标准,当账单金额具有相同的正负值时,日期是两者以及代码都相同。

例如,记录 1 的帐单金额为 250 美元,代码为“JUN”,日期为 03/02/2020,记录 2 的帐单金额为 250 美元,代码为“PII”,日期为 03/07 /2020 和记录 3 的账单金额为 -$250,代码为“PII”,日期为 03/07/2020。我希望在此示例中看到的结果仅为记录 1,其中记录 2 和 3 将根据我所述的标准被视为重复项。

表创建:

CREATE TABLE Billing (
    BillId varchar(10),
    SerialNo varchar(10),
    BillAmt MONEY,
    Code varchar(5),
    DispenseDt DATE
);

数据输入:

INSERT INTO Billing (BillId, SerialNo, BillAmt, Code, DispenseDt)
VALUES ('BL_001','aaa-111',250,'AAP','20200503')
      ,('BL_002','aab-112',250,'ADD','20200309')
      ,('BL_003','aab-112',-250,'ADD','20200309')
      ,('BL_004','aba-120',700,'YED','20200503')
      ,('BL_005','aba-120',370,'TPP','20200822')
      ,('BL_006','aba-120',370,'TPP','20201003')
      ,('BL_007','aba-120',400,'TPP','20200822')
      ,('BL_008','aba-120',-370,'TPP','20200822')
      ,('BL_009','aba-120',-700,'YED','20200503')
      ,('BL_010','baa-201',1000,'TOK','20200927')
      ,('BL_011','baa-201',-1000,'TOK','20200927')
      ,('BL_012','bab-210',1000,'TOK','20200927');

样本数据:

+----------+-----------+---------+------+------------+
| BillId  | SerialNo  | BillAmt | Code | DispenseDt |
+----------+-----------+---------+------+------------+
| BL_001   | aaa-111   | $250    | AAP  | 20200503   |
| BL_002   | aab-112   | $250    | ADD  | 20200309   |
| BL_003   | aab-112   |-$250    | ADD  | 20200309   |
| BL_004   | aba-120   | $700    | YED  | 20200503   |
| BL_005   | aba-120   | $370    | TPP  | 20200822   |
| BL_006   | aba-120   | $370    | TPP  | 20201003   |
| BL_007   | aba-120   | $400    | TPP  | 20200822   |
| BL_008   | aba-120   |-$370    | TPP  | 20200822   |
| BL_009   | aba-120   |-$700    | YED  | 20200503   |
| BL_010   | baa-201   | $1000   | TOK  | 20200927   |
| BL_011   | baa-201   |-$1000   | TOK  | 20200927   |
| BL_012   | bab-210   | $1000   | TOK  | 20200927   |
+----------+-----------+---------+------+------------+

想要的结果:

+----------+-----------+---------+------+------------+
| BillId  | SerialNo  | BillAmt | Code | DispenseDt |
+----------+-----------+---------+------+------------+
| BL_001   | aaa-111   | $250    | AAP  | 20200503   |
| BL_006   | aba-120   | $370    | TPP  | 20201003   |
| BL_007   | aba-120   | $400    | TPP  | 20200822   |
| BL_012   | bab-210   | $1000   | TOK  | 20200927   |
+----------+-----------+---------+------+------------+

我的代码:

select a.SerialNo, a.BillAmt, a.Code, a.DispenseDt
from (
    select *,
      count(SerialNo) over(partition by SerialNo, DispenseDt) b
    from Billing ) a
where b = 1
AND
    InvoiceDt >= '20200601' And InvoiceDt <= '20200630'
    AND
    FacID IN ('IND600','IND605','IND610','IND620','IND630','IND640','IND650','IND660','IND670','IND680','IND690','IND695')
ORDER BY a.Serial;

【问题讨论】:

  • 在您的示例中,记录 2 和 3 是否会因为余额等于 0 而被视为重复?
  • @tj cappelletti - 是的,但也因为日期和代码值相同。
  • BL_012 应该不会出现,因为金额/日期和代码相同,您能确认一下吗?

标签: sql duplicates ssms


【解决方案1】:

我试图解决它,但我有点卡住了。这里的逻辑是获取排名,然后过滤相同的排名,但不知何故,我的代码创建了排名 [使用 rank() 和 row_number() 创建了 2 个],这将删除一些你需要作为输出的情况,如果有人否则可以编辑此代码吗?那太好了

小提琴链接: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=e0c990d3694ad99b628b3e05a5de624f

select 

Bill_ID,
Code,
DispenseDt,
new_bill_amt,
rank()
over(partition by new_bill_amt,DispenseDt, code) as rank_,
row_number()
over(partition by new_bill_amt,DispenseDt, code) as rank_2

from (
select
*,
replace(billamt,'-','') as new_bill_amt
from Billing
) as f

【讨论】:

    【解决方案2】:

    我认为这可能有效。

    (我使用了 CTE,但您可以将其转换为子查询。)

    WITH base_cte AS (
        SELECT 
            B1.SerialNo
        ,   SUM(B1.BillAmt) AS [TotAmt]
        ,   B1.Code
        ,   B1.DispenseDt
        FROM #Billing AS B1
        GROUP BY 
            B1.SerialNo
        ,   B1.Code
        ,   B1.DispenseDt
    )
    SELECT 
        B.BillId
    ,   B.SerialNo
    ,   B.BillAmt
    ,   B.code
    ,   B.DispenseDt
    FROM #Billing AS B
    LEFT JOIN base_cte AS X ON X.SerialNo = B.SerialNo
    WHERE X.TotAmt = B.BillAmt 
    AND X.DispenseDt = B.DispenseDt
    

    输出:

    BillId  SerialNo  BillAmt   code    DispenseDt
    BL_001  aaa-111   250.00    AAP     2020-05-03
    BL_006  aba-120   370.00    TPP     2020-10-03
    BL_007  aba-120   400.00    TPP     2020-08-22
    BL_012  bab-210   1000.00   TOK     2020-09-27
    

    编辑:这是 OVER() 的另一种方法。

    SELECT 
        Y.BillId
    ,   Y.SerialNo
    ,   Y.BillAmt
    ,   Y.Code
    ,   Y.DispenseDt
    FROM (
        SELECT X.*
        ,   [Ct] = COUNT(*) OVER(PARTITION BY X.code, X.TotAmt, X.DispenseDt ORDER BY X.SerialNo, X.code, X.DispenseDt)
        FROM (
            SELECT 
                B.BillId
            ,   B.SerialNo
            ,   B.BillAmt
            ,   B.code
            ,   B.DispenseDt
            ,   [TotAmt] = SUM(B.BillAmt) OVER(PARTITION BY B.SerialNo, B.code, B.DispenseDt ORDER BY B.SerialNo, B.code, B.DispenseDt)
            FROM #Billing AS B
        ) AS X
    ) AS Y
    WHERE Y.BillAmt = Y.TotAmt
    ORDER BY Y.BillId
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-09-08
      • 1970-01-01
      • 2013-06-30
      • 1970-01-01
      • 1970-01-01
      • 2017-12-12
      • 1970-01-01
      • 2021-06-28
      相关资源
      最近更新 更多