SQL查询返回特定列的重复行，但另一列具有唯一值答案

【问题标题】：SQL query to return duplicate rows for certain column, but with unique values for another columnSQL查询返回特定列的重复行，但另一列具有唯一值
【发布时间】：2022-02-16 14:45:19
【问题描述】：

我编写了此处显示的查询，该查询组合了三个表并返回来自appeal_tickets 的at_ticket_num 重复但针对不同的at_sys_ref 值的行

select top 100 
    t.t_reference, at.at_system_ref, at_ticket_num, a.a_case_ref
from 
    tickets t, appeal_tickets at, appeals_2 a
where 
    t.t_reference in ('AB123','AB234') -- filtering on these values so that I can see that its working
    and t.t_number = at.at_ticket_num
    and at.at_system_ref = a.a_system_ref
    and at.at_ticket_num IN (select at_ticket_num
                             from appeal_tickets
                             group by at_ticket_num
                             having count(distinct at_system_ref) > 1)
order by 
    t.t_reference desc

这是输出：

t_reference  at_system_ref  at_ticket_num   a_case_ref
-------------------------------------------------------
    AB123       30838974      23641583      1111979010
    AB123       30838976      23641583      1111979010
    AB234       30839149      23641520      1111977352
    AB234       30839209      23641520      1111988003

我想对此进行修改，使其仅返回 t_reference 重复但针对不同 a_case_ref 的记录。所以在上述情况下，只会返回 AB234 的记录。

任何帮助将不胜感激。

【问题讨论】：

请注意，表有行和列，而不是记录和字段。
今日提示：始终使用现代、明确的JOIN 语法！更容易编写（没有错误），更容易阅读和维护，如果需要更容易转换为外连接！
Bad habits to kick : using old-style JOINs - 旧式 逗号分隔的表格列表 样式已替换为 ANSI 中的 proper ANSI JOIN 语法-92 SQL 标准（30 年！！ 前），不鼓励使用它
您使用的是什么 DBMS？请用它标记您的请求。
好的。我正在使用 SQL 服务器。已标记。

标签： sql sql-server

【解决方案1】：

您希望所有具有多个系统参考和多个案例参考的工单上诉似乎。您可以加入表格，计算每个工单的出现次数，然后只保留符合这些条件的工单。

select *
from
(
  select
    t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref,
    count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,
    count(distinct a.a_case_ref) over (partition by at.at_ticket_num) as caserefs
  from tickets t
  join appeal_tickets at on at.at_ticket_num = t.t_number
  join appeals_2 a on a.a_system_ref = at.at_system_ref
) counted
where sysrefs > 1 and caserefs > 1
order by t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref;

更正

SQL Server 似乎还不支持COUNT(DISTINCT ...) OVER (...)。不过，您可以在子查询中计算不同的值。替换

count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,

通过

(
  select count(distinct a2.a_system_ref)
  from appeal_tickets at2
  join appeals_2 a2 on a2.a_system_ref = at2.at_system_ref
  where at2.at_ticket_num = t.t_number
) as sysrefs,

另一种解决方法是在两个方向上使用DENSE_RANK（在此处找到：https://stackoverflow.com/a/53518204/2270762）：

dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref) +
dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref desc) - 
1 as sysrefs,

【讨论】：

当我执行此查询时，我收到错误“在 OVER 子句中不允许使用 DISTINCT。”在线搜索此错误表明我需要使用 dense_rank 函数。我的 SQL 知识很薄弱，而且我以前从未见过 dense_rank。我需要做一些阅读，看看我是否能弄清楚。谢谢。
这在 SQL Server 方面表现不佳。他们似乎在窗口函数中不支持DISTINCT。为此，我已经用两种解决方案更新了我的答案。
对不起，伙计。这没有奏效。我不得不修改您上面的子查询更正，因为它不包括门票表上的联接。我为案例参考添加了一个，看起来像 ...( select count(distinct a2.a_case_ref) from Approach_tickets at2 join avenues_2 a2 on a2.a_system_ref = at2.at_system_ref join ticket t on t.t_number = at2.at_ticket_num where at2. at_ticket_num = t.t_number ）作为 caserefs。但是，当我仅通过 2 个票号运行查询过滤时，我得到...
t_reference at_system_ref at_ticket_num a_case_ref sysrefs caserefs AB123 30839149 23641520 2222977352 169967 158847 AB123 30839209 23641520 2222988003 169967 158847 AB234 30838974 23641583 2222979010 169967 158847 AB234 30838976 23641583 2222979010 169967 158847 ..
即最后是唯一 systemrefs 和 caserefs 的计数，并且它不会过滤掉 caseref 相同的票号。

【解决方案2】：

with data as (
    <your query plus one column>,
    case when
        min() over (partition by t.t_reference)
        <>
        max() over (partition by t.t_reference)
        then 1 end as dup
)
select * from data where dup = 1

【讨论】：

加一栏是什么意思。当我尝试这个查询时，我在 case 语句中遇到语法错误。
我的意思是插入您的查询以及所描述的附加列。