【问题标题】:How to handle duplicates created by LEFT JOIN如何处理由 LEFT JOIN 创建的重复项
【发布时间】:2020-02-13 08:55:46
【问题描述】:

左表:

+------+---------+--------+
| Name | Surname | Salary |
+------+---------+--------+
| Foo  | Bar     |    100 |
| Foo  | Kar     |    300 |
| Fo   | Ba      |     35 |
+------+---------+--------+

右表:

+------+-------+
| Name | Bonus |
+------+-------+
| Foo  |    10 |
| Foo  |    20 |
| Foo  |    50 |
| Fo   |    10 |
| Fo   |   100 |
| F    |  1000 |
+------+-------+

期望的输出:

+------+---------+--------+-------+
| Name | Surname | Salary | Bonus |
+------+---------+--------+-------+
| Foo  | Bar     |    100 |    80 |
| Foo  | Kar     |    300 |     0 |
| Fo   | Ba      |     35 |   110 |
+------+---------+--------+-------+

我得到的最接近的是:

SELECT 
    a.Name,
    Surname,
    sum(Salary),
    sum(Bonus)
FROM (SELECT 
        Name,
        Surname,
        sum(Salary) as Salary
      FROM input
      GROUP BY 1,2) a LEFT JOIN (SELECT Name,
                                        SUM(Bonus) as Bonus
                                 FROM input2
                                 GROUP BY 1) b 
ON a.Name = b.Name
GROUP BY 1,2;

给出:

+------+---------+-------------+------------+
| Name | Surname | sum(Salary) | sum(Bonus) |
+------+---------+-------------+------------+
| Fo   | Ba      |          35 |        110 |
| Foo  | Bar     |         100 |         80 |
| Foo  | Kar     |         300 |         80 |
+------+---------+-------------+------------+

我不知道如何摆脱 Bonus 重复。对我来说,理想的解决方案是在“期望的输出”中指定,即仅将 Bonus 添加到一个 Name,对于具有相同 Name 的其他记录,添加 0

【问题讨论】:

  • 对于初学者,您不需要在这里同时加入namesurname 吗?
  • 右表没有姓氏(但我同意它的外键设计很糟糕)
  • 这些表只是举例,我需要它的真实表要复杂得多。 @JacobH 是的,问题是我只需要名字就可以加入。

标签: sql postgresql duplicates left-join


【解决方案1】:

你可以使用row_number():

select l.*, (case when l.seqnum = 1 then r.bonus else 0 end) as bonus
from (select l.*, row_number() over (partition by name order by salary) as seqnum
      from "left" l
     ) l left join
     (select r.name, sum(bonus) as bonus
      from "right" r
      group by r.name
     ) r
     on r.name = l.name 

【讨论】:

    【解决方案2】:

    在按名称划分的名称类别上尝试行号。这将为您的重复项提供不同的编号。然后您可以搜索此数字为 1 的情况并返回您想要的结果。否则返回 0。代码可能如下所示。

    SELECT 
        a.Name,
        Surname,
        sum(Salary),
        Case    when Duplicate_Order = 1
                then bonus
                else 0
                end as 'Bonus'
    FROM (SELECT 
            Name,
            Surname,
            sum(Salary) as Salary
            ,ROW_NUMBER() over (partition by Name order by name) as [Duplicate_Order]
          FROM input
          GROUP BY 1,2) a 
          LEFT JOIN (SELECT Name,
                        SUM(Bonus) as Bonus
                    FROM input2
                    GROUP BY 1) b 
    ON a.Name = b.Name
    GROUP BY 1,2;
    

    希望有帮助!

    【讨论】:

      【解决方案3】:

      您可以使用带有sum() 聚合的相关子查询来计算奖励列,然后应用lag() 窗口分析函数来获取name 列的连续相同值列值的零:

      select Name, Surname, Salary, 
             bonus - lag(bonus::int,1,0) over (partition by name order by salary) as bonus
      from
      (
      select i1.*, 
             ( select sum(Bonus) 
                 from input2 i2 
                where i1.Name = i2.Name 
                group by i2.Name  ) as bonus
        from input i1
      ) ii
      order by name desc, surname;
      

      Demo

      【讨论】:

        猜你喜欢
        • 2011-02-18
        • 2021-08-23
        • 1970-01-01
        • 1970-01-01
        • 2020-09-19
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多