【问题标题】:SQL - Finding Duplicate Records based certain criteriaSQL - 根据特定条件查找重复记录
【发布时间】:2022-12-06 21:12:46
【问题描述】:

我在表中有这些记录 - employee_projects

id employee_id project_id status
1 emp1 proj1 VERIFIED
2 emp2 proj2 REJECTED
3 emp1 proj1 VERIFIED
4 emp1 proj3 REJECTED
5 emp2 proj2 REQUIRED
6 emp3 proj4 SUBMITTED
7 emp4 proj5 VERIFIED
8 emp4 proj6 VERIFIED
9 emp3 proj4 REQUIRED

以下是确定重复项的标准:

  1. 同一员工ID,同一状态下的同一项目ID(示例:第1行和第3行重复)
  2. 相同的员工 ID、相同的项目 ID 但状态不同(示例:第 6 行和第 9 行重复)。 重复标准#2 的一个例外是,如果一个项目是必需的,而同一个项目在同一员工下也被拒绝,则这不被视为重复。例如,第 2 行和第 5 行不是重复的。

    我有一个关于第一个标准的查询:

    select
        emp_id,
        proj_id,
        status,
        COUNT(*)
    from
        employee_projects
    group by
        emp_id,
        proj_id,
        status
    having
        COUNT(*) > 1
    

    我正在努力构建的是第二个标准的 SQL。

【问题讨论】:

  • 如果对于相同的 emp_idproj_id,你有 status 的 'REJECTED'、'REQUIRED'、'REJECTED'、'REJECTED'...等,这是否被视为重复?

标签: sql


【解决方案1】:

也许自我加入可以帮助你。

with t (employee_id ,project_id,status)
as
(
select 'emp1',  'proj1' ,   'VERIFIED'
Union all select 'emp2',    'proj2' ,   'REJECTED'
Union all select 'emp1',    'proj1' ,   'VERIFIED'
Union all select 'emp1',    'proj3' ,   'REJECTED'
Union all select 'emp2',    'proj2' ,   'REQUIRED'
Union all select 'emp3',    'proj4' ,   'SUBMITTED'
Union all select 'emp4',    'proj5' ,   'VERIFIED'
Union all select 'emp4',    'proj6' ,   'VERIFIED'
Union all select 'emp3',    'proj4' ,   'REQUIRED'
)
select
    t.employee_id,
    t.project_id,
    t.status,
    '' as status,
    'criteria#1' as SQL
from
    t
group by
    t.employee_id,
    t.project_id,
    t.status
having
    COUNT(*) > 1
union all
SELECT 
    t.employee_id,
    t.project_id,
    t.status,
    a.status,
    'criteria#2' as SQL
FROM
    t
    left join t as a on 
        t.employee_id = a.employee_id and
        t.project_id = a.project_id
where 
    t.status != a.status and
    concat(t.status,a.status) != 'REQUIREDREJECTED' and
    concat(t.status,a.status) != 'REJECTEDREQUIRED'

【讨论】:

    【解决方案2】:

    尝试以下操作:

    select T.emp_id, T.proj_id, T.status, D.dup_cnt
    from employee_projects T join
    (
      select emp_id, proj_id, count(*) as dup_cnt
      from employee_projects
      group by emp_id, proj_id
      having count(*) > 1 and
        count(distinct case when status in ('REQUIRED', 'REJECTED') then status end) < 2
    ) D
    on T.emp_id = D.emp_id and T.proj_id = D.proj_id
    order by T.emp_id, T.proj_id
    

    如果您想将具有状态('REQUIRED'、'REJECTED'、任何其他状态)的员工视为重复员工,请修改 having 子句,如下所示:

    select T.emp_id, T.proj_id, T.status, D.dup_cnt
    from employee_projects T join
    (
      select emp_id, proj_id, count(*) as dup_cnt
      from employee_projects
      group by emp_id, proj_id
      having count(*) > 1 and
        (count(distinct case when status in ('REQUIRED', 'REJECTED') then status end) < 2 or count(distinct status) > 2)
    ) D
    on T.emp_id = D.emp_id and T.proj_id = D.proj_id
    order by T.emp_id, T.proj_id
    

    demo

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-07-27
      • 2021-12-27
      • 1970-01-01
      • 2021-08-02
      • 2019-04-04
      • 2018-02-17
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多