基于一列的子字符串组合类似的行？答案

【问题标题】：Combine like rows based on substring of one column?基于一列的子字符串组合类似的行？
【发布时间】：2021-10-25 22:45:00
【问题描述】：

我有一张如下所示的零件表：

Part	Part Num	Thing1	Thing2	Thing3	Thing4
Door	10105322	abc	abc
Door	10105323	abc	abc
Door	10105324	abc	abc
Door	84625111	abc	abc	abc
Door	84625118	abc	abc	abc
Door	84625185	abc	abc		abc
Door	56897101	abc	abc

部件号始终为 8 个字符。对于许多部分，前 6 个字符相同，后 2 个字符不同。零件编号前 6 个字符相同的行以及 Thing1/Thing2/Thing3/Thing4 中所有具有相同值的行需要合并，零件编号变为 6 个字符。（上表第1/2/3行）

前 6 个字符相同但 Thing1/Thing2/Thing3/Thing4 中的值在所有行中不相同的行需要保持不变，并且部件号保持 8 个字符。（上表第4/5/6行）

前 6 个字符唯一的行需要保持不变，部件号保持 8 个字符。（上表第7行）

想要的结果如下所示：

Part	Part Num	Thing1	Thing2	Thing3	Thing4
Door	101053	abc	abc
Door	84625111	abc	abc	abc
Door	84625118	abc	abc	abc
Door	84625185	abc	abc		abc
Door	56897101	abc	abc

【问题讨论】：

根据问题指南，请展示您的尝试并告诉我们您发现了什么（在本网站或其他地方）以及为什么它不能满足您的需求。
我今天早些时候没有保存我的 SQL，但我尝试在 LEFT([Part Num], 2) 上分配一个 DENSE_RANK 并按所有其他列进行分区，并计划如果排名值为 1，则使用 CASE 语句生成 6 位零件编号。但是，在我的示例中，这不适用于第 4/5/6 行。它将相同的排名 1 分配给第 4/5 行，将排名 2 分配给第 6 行。我最初的想法是所有符合条件的行都将被分配 1 排名，其他任何东西都会获得 2、3、4 等排名，但是事实并非如此。

标签： sql sql-server tsql window-functions

【解决方案1】：

使用COUNT()窗口函数：

WITH cte AS (
  SELECT *,
         COUNT(*) OVER (PARTITION BY Part, LEFT(PartNum, 6), Thing1, Thing2, Thing3, Thing4) counter1,
         COUNT(*) OVER (PARTITION BY Part, LEFT(PartNum, 6)) counter2
  FROM tablename
)
SELECT DISTINCT
  Part,
  CASE WHEN counter1 > 1 AND counter1 = counter2 THEN LEFT(PartNum, 6) ELSE PartNum END PartNum,
  Thing1, Thing2, Thing3, Thing4 
FROM cte;

请参阅demo。

【讨论】：

【解决方案2】：

您可以使用窗口函数来确定应该组合的内容。我想我可能会将所有内容合并为一个比较：

select (case when min_thingee = max_thingee and cnt > 1
             then left(partnum, 6) else partnum
        end) as partnum,
       min(thing1) as thing1, min(thing2) as thing2,
       min(thing3) as thing3, min(thing4) as thing4
from (select t.*,
             min(concat(thing1, '|', thing2, '|', thing3, '|', thing4) over (partition by left(partnum, 6)) as min_thingee,
             max(concat(thing1, '|', thing2, '|', thing3, '|', thing4) over (partition by left(partnum, 6)) as max_thingee,
             count(*) over (partition by left(partnum, 6)) as cnt
      from t
     ) t
group by (case when min_thingee = max_thingee and cnt > 1
               then left(partnum, 6) else partnum
          end);

【讨论】：

【解决方案3】：

如果你真的想使用dense_rank，这里有一种方法。

基本统计数据告诉我们，一组相等数字的标准差等于 0。这意味着，一旦我们获得每个 left(partnum,6) 的排名，我们就可以强制执行条件，以便我们只折叠那些包含 --there 的行组只有一个唯一的排名并且至少有两行（stdev 在单个值上导致null 其中<> 0）。注意partition by 子句，看看排名是如何计算的

with cte as

(select *, dense_rank() over (order by part, left(partnum,6), thing1, thing2, thing3, thing4) as rnk
 from my_table)

select distinct 
       part,
       case when stdev(rnk) over (partition by part, left(partnum,6)) = 0 then left(partnum,6) else partnum end as partnum,
       thing1,
       thing2,
       thing3,
       thing4
from cte;

【讨论】：