比较单个表中的值的最佳方法答案

【问题标题】：Best way to compare values in single table比较单个表中的值的最佳方法
【发布时间】：2019-12-21 16:23:51
【问题描述】：

我有一个 SQL 服务器表，其中包含以下格式的数据。

ColumnA : ServerName
ColumnB : ObjectName
ColumnC : HashValue

我需要做的是根据 ColumnB 比较每个服务器的 ColumnC 中的值。目标是查看哪些服务器具有不同哈希的相同对象。

到目前为止，我们所做的是根据服务器名称将表拆分为多个表。所以对于 table_one，它只包含服务器一的数据。

然后我们对服务器 2 对 table_two 执行相同操作。

在这之后我们做了一个从 table_one 到 table_two 的左连接，因为 table_one 有更多的条目。

从这一点我们可以做到以下几点。

select * from table_one 
left join table_two
on table_one.ColumnB =table_two.ColumnB 
where table_one.ColumnC !=table_two.ColumnC

这个问题是有 10 个服务器，每个服务器至少有大约 10 000 个条目，这是一个缓慢的过程。

select * into table_one from table_one  where ColumnA ="ServerOne"
select * into table_two from table_one  where ColumnA ="ServerTwo"

select * from table_one 
left join table_two
on table_one.ColumnB =table_two.ColumnB 
where table_one.ColumnC !=table_two.ColumnC

我希望看到每个对象的服务器名称与来自服务器一的哈希值不匹配。使用服务器 1 作为基础，因为它的大多数对象不一定与其他对象完全相同。

【问题讨论】：

您的 LEFT JOIN 返回常规的 INNER JOIN 结果。将 WHERE 子句条件移至 ON 子句，得到真正的 LEFT JOIN 结果。
@jarl，它也会返回与 table_one.ColumnC = table_two.ColumnC 无关的行。而是允许在where 中使用null。
@RomanoBrooks 。 . .用您正在使用的数据库标记我们的问题。

标签： sql dynamic pivot

【解决方案1】：

您不必将它们拆分到不同的表中。你可以自己加入一个表：

Select a.ColumnA from your_table a
inner join
your_table b on a.ColumnB = b.ColumnB and a.ColumnC != b.ColumnC

编辑：

我不知道您的表上有什么依赖关系，但以防万一目标不是唯一的，只需将查询调整为：

Select a.ColumnA from your_table a
inner join
your_table b on a.ColumnA = b.ColumnA = a.ColumnB = b.ColumnB and a.ColumnC != b.ColumnC

不过，如果您非常需要该语句，我还是建议您清理表格。

【讨论】：

【解决方案2】：

一次比较，显示丢失的对象或不同的ColumnC

select t1.*, t2.ColumnA, t2.ColumnC
from table_all t1
left join table_all t2
on t1.ColumnA < t2.ColumnA and t1.ColumnB = t2.ColumnB 
where t2.ColumnC is null or t1.ColumnC != t2.ColumnC
order by t1.ColumnA, t1.ColumnB

【讨论】：

【解决方案3】：

目标是查看哪些服务器具有相同的对象但具有不同的哈希值。

您可以使用聚合获取不同的服务器和对象的列表：

select ServerName, ObjectName
from t
group by ServerName, ObjectName
having min(HashValue) <> max(HashValue);

如果你真的想要细节，那么我会推荐窗口函数：

select ServerName, ObjectName
from (select t.*,
             min(HashValue) over (partition by ServerName, ObjectName) as min_hashValue,
             max(HashValue) over (partition by ServerName, ObjectName) as max_hashValue
      from t
     ) t
where min_HashValue) <> max_HashValue
order by ServerName, ObjectName, hashValue;

我更喜欢这种方法，因为它会生成一个 list 值。如果您使用 JOIN 执行此操作，那么您将获得大量 2 路比较——这只会增加您需要查看的行数。

【讨论】：