【问题标题】:SQL Challenge "Names"SQL 挑战“名称”
【发布时间】:2022-11-10 01:56:59
【问题描述】:

设置:

下面是一段代码,它以一组不寻常的格式生成样本名称表。任务是将它们转换为标准格式。还列出每个名称的所需结果,因此请求不会混淆。

DROP TABLE IF EXISTS #temp;
CREATE TABLE #temp (Testname VARCHAR(20) null, Desiredresult VARCHAR(20) null);
INSERT INTO #temp(Testname, Desiredresult)
VALUES('ct last/firstn bc', 'Firstn Last');
INSERT INTO #temp(Testname, Desiredresult)
VALUES('ct lastn/first', 'First Lastn');
INSERT INTO #temp(Testname, Desiredresult)
VALUES('last/firstname bs', 'Firstname Last');
INSERT INTO #temp(Testname, Desiredresult)
VALUES('lastname/first', 'First Lastname');
INSERT INTO #temp(Testname, Desiredresult)
VALUES('First Last', 'First Last');
INSERT INTO #temp(Testname, Desiredresult)
VALUES('Firstname A Lastname', 'Firstname Lastname');

我能够生成适用于此的代码,但我毫不怀疑这不是最有效的方法。我很想知道完成这项任务的更好方法。下面是我为此编写的代码。

DROP TABLE IF EXISTS #test
SELECT *
,CASE
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 2 AND CHARINDEX('/',T.Testname) <> 0 THEN SUBSTRING(T.Testname,CHARINDEX('/',T.Testname)+1,LEN(T.Testname)-CHARINDEX('/',T.Testname)+1-CHARINDEX(' ',REVERSE(T.Testname))-1) 
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 2 AND CHARINDEX('/',T.Testname) =  0 THEN LEFT(T.Testname,CHARINDEX(' ',T.Testname)-1) 
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 1 AND CHARINDEX('/',T.Testname) <> 0 AND CHARINDEX(' ',T.Testname) < CHARINDEX('/',T.TestName) THEN SUBSTRING(T.Testname,CHARINDEX('/',T.Testname)+1,LEN(T.Testname)) 
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 1 AND CHARINDEX('/',T.Testname) <> 0 AND CHARINDEX(' ',T.Testname) > CHARINDEX('/',T.TestName)THEN SUBSTRING(T.Testname,CHARINDEX('/',T.Testname)+1,CHARINDEX(' ',T.Testname)-CHARINDEX('/',T.Testname)-1) 
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 1 AND CHARINDEX('/',T.Testname) =  0 THEN LEFT(T.Testname,CHARINDEX(' ',T.Testname)-1)
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 0 THEN SUBSTRING(T.Testname,CHARINDEX('/',T.Testname)+1,LEN(T.Testname)) 
END AS FirstName
,CASE
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 2 AND CHARINDEX('/',T.Testname) <> 0 THEN SUBSTRING(T.Testname,CHARINDEX(' ',T.Testname)+1,CHARINDEX('/',T.Testname)-CHARINDEX(' ',T.Testname)-1)
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 2 AND CHARINDEX('/',T.Testname) =  0 THEN RIGHT(T.Testname,CHARINDEX(' ',REVERSE(T.Testname))-1)
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 1 AND CHARINDEX('/',T.Testname) <> 0 AND CHARINDEX(' ',T.Testname) < CHARINDEX('/',T.TestName) THEN SUBSTRING(T.Testname,CHARINDEX(' ',T.Testname)+1,CHARINDEX('/',T.Testname)-CHARINDEX(' ',T.Testname)-1)
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 1 AND CHARINDEX('/',T.Testname) <> 0 AND CHARINDEX(' ',T.Testname) > CHARINDEX('/',T.TestName)THEN SUBSTRING(T.Testname,1,CHARINDEX('/',T.Testname)-1)
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 1 AND CHARINDEX('/',T.Testname) =  0 THEN RIGHT(T.Testname,CHARINDEX(' ',REVERSE(T.Testname))-1)
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 0 THEN SUBSTRING(T.Testname,1,CHARINDEX('/',T.Testname)-1)
END AS LastName
,CASE
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 2 AND CHARINDEX('/',T.Testname) <> 0 THEN SUBSTRING(T.Testname,CHARINDEX('/',T.Testname)+1,LEN(T.Testname)-CHARINDEX('/',T.Testname)+1-CHARINDEX(' ',REVERSE(T.Testname))) + SUBSTRING(T.Testname,CHARINDEX(' ',T.Testname)+1,CHARINDEX('/',T.Testname)-CHARINDEX(' ',T.Testname)-1)
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 2 AND CHARINDEX('/',T.Testname) =  0 THEN LEFT(T.Testname,CHARINDEX(' ',T.Testname)-1) + RIGHT(T.Testname,CHARINDEX(' ',REVERSE(T.Testname)))
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 1 AND CHARINDEX('/',T.Testname) <> 0 AND CHARINDEX(' ',T.Testname) < CHARINDEX('/',T.TestName) THEN SUBSTRING(T.Testname,CHARINDEX('/',T.Testname)+1,LEN(T.Testname)) + ' ' + SUBSTRING(T.Testname,CHARINDEX(' ',T.Testname)+1,CHARINDEX('/',T.Testname)-CHARINDEX(' ',T.Testname)-1)
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 1 AND CHARINDEX('/',T.Testname) <> 0 AND CHARINDEX(' ',T.Testname) > CHARINDEX('/',T.TestName)THEN SUBSTRING(T.Testname,CHARINDEX('/',T.Testname)+1,CHARINDEX(' ',T.Testname)-CHARINDEX('/',T.Testname)-1) + ' ' + SUBSTRING(T.Testname,1,CHARINDEX('/',T.Testname)-1)
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 1 AND CHARINDEX('/',T.Testname) =  0 THEN T.Testname
    WHEN LEN(Testname)-LEN(REPLACE(Testname, ' ', '')) = 0 THEN SUBSTRING(T.Testname,CHARINDEX('/',T.Testname)+1,LEN(T.Testname)) + ' ' + SUBSTRING(T.Testname,1,CHARINDEX('/',T.Testname)-1)
END AS FullName
INTO #test
FROM #temp AS T

SELECT 
     T.Testname
    ,T.Desiredresult
    ,UPPER(LEFT(T.FirstName,1))+LOWER(RIGHT(T.FirstName,LEN(T.FirstName)-1))+' '+UPPER(LEFT(T.LastName,1))+LOWER(RIGHT(T.LastName,LEN(T.LastName)-1)) AS ProperName

FROM #test AS T

【问题讨论】:

  • 严肃的问题:您将如何处理只有一个名字的人的记录? Falsehoods Programmers Believe About Names
  • @AlwaysLearning 这是一个很好的问题,我的代码没有解决这个问题,但问题中提供的数据集以及我实际正在处理的数据集都不是问题。为它写一个案例会很有趣,以便将来进行校对。
  • 这里没有灵丹妙药。随着时间的推移,任何算法都必须被观察和调整。随着人口的增长,您会发现越来越多的惊喜。
  • @JohnCappelletti 我知道对于每一个可能发生的情况都不会有灵丹妙药。那不是我要求的。请求是针对这 6 个示例。就像没有能赢得国际象棋的灵丹妙药一样,国际象棋挑战无处不在。

标签: sql sql-server tsql sql-server-2016


【解决方案1】:

这是一个替代方案,对于有限的挑战可能会更干净一些

例子

Select A.* 
      ,DispName = case when charindex('/',TestName)>0 
                       then ltrim(concat(Pos4,' '+Pos3,' '+Pos2,' '+Pos1))
                       else ltrim(concat(Pos1,' '+Pos2,' '+Pos3,' '+Pos4))
                  end
 From  #Temp A
 Cross Apply (
                Select Pos1 = JSON_VALUE(S,'$[0]')+case when len(JSON_VALUE(S,'$[0]'))<=2 then null else '' end
                      ,Pos2 = JSON_VALUE(S,'$[1]')+case when len(JSON_VALUE(S,'$[1]'))<=2 then null else '' end
                      ,Pos3 = JSON_VALUE(S,'$[2]')+case when len(JSON_VALUE(S,'$[2]'))<=2 then null else '' end
                      ,Pos4 = JSON_VALUE(S,'$[3]')+case when len(JSON_VALUE(S,'$[3]'))<=2 then null else '' end
                 From (values ( '["'+replace(string_escape(replace(TestName,'/',' '),'json'),' ','","')+'"]' ) ) B1(S)  
             ) B

结果

扩展为 ProperCase

Select A.* 
      ,DispName = case when charindex('/',TestName)>0 
                       then ltrim(concat(Pos4,' '+Pos3,' '+Pos2,' '+Pos1))
                       else ltrim(concat(Pos1,' '+Pos2,' '+Pos3,' '+Pos4))
                  end
 From  #Temp A
 Cross Apply (
                Select Pos1 = upper(left(Pos1,1))+lower(stuff(Pos1,1,1,''))+case when len(Pos1)<=2 then null else '' end
                      ,Pos2 = upper(left(Pos2,1))+lower(stuff(Pos2,1,1,''))+case when len(Pos2)<=2 then null else '' end
                      ,Pos3 = upper(left(Pos3,1))+lower(stuff(Pos3,1,1,''))+case when len(Pos3)<=2 then null else '' end
                      ,Pos4 = upper(left(Pos4,1))+lower(stuff(Pos4,1,1,''))+case when len(Pos4)<=2 then null else '' end
                  From (
                        Select Pos1 = JSON_VALUE(S,'$[0]')
                              ,Pos2 = JSON_VALUE(S,'$[1]')
                              ,Pos3 = JSON_VALUE(S,'$[2]')
                              ,Pos4 = JSON_VALUE(S,'$[3]')
                         From (values ( '["'+replace(string_escape(replace(TestName,'/',' '),'json'),' ','","')+'"]' ) ) B1(S)  
                        ) B0
             ) B

【讨论】:

  • 我真的很喜欢这个答案,它很干净。我还展示了一些我需要了解的新功能。 Json_Value 和 String_Escape 对我来说是新的。
【解决方案2】:

这本身可能不是一个答案,但显示了一些可以重构查询以减少重复并提高可读性的技术。

每当您发现自己编写重复的表达式时,您可能会发现将它们分开在 CROSS APPLY 中很有用,这允许定义和计算表达式一次,然后在最终结果或附加 CROSS APPLYs 中多次使用它们。

对于您的情况,您可以将主查询中的 FROM 子句更改为:

SELECT
    ...
FROM #temp AS T
CROSS APPLY (
  SELECT
    Len = LEN(T.Testname),
    SpaceCount = LEN(T.Testname) - LEN(REPLACE(Testname, ' ', '')),
    SlashIndex = CHARINDEX('/', T.Testname),
    SpaceIndex = CHARINDEX(' ', T.Testname),
    ReverseSpaceIndex = CHARINDEX(' ', REVERSE(T.Testname))
) A

然后用等效的A.LenA.SpaceCount 等替换较早出现的表达式。您的#Temp 表也可以通过这种方式消除。

有关代码的更新版本,请参阅 this db<>fiddle

您甚至可以更进一步,将单次使用但复杂的计算从选择列表移到额外的 OUTER APPLY 中,如果您认为这样可以使代码更具可读性,则可以将它们分开。您的#test 表也可以使用类似的技术消除。

SELECT T.Testname, T.Desiredresult, C.ProperName
INTO #test
FROM #temp AS T
CROSS APPLY (
  ...
) A
CROSS APPLY (
    SELECT -- Long complex expressions follow
        ... AS FirstName
        ,... AS LastName
        ,... AS FullName
) B
CROSS APPLY (
    SELECT ... AS ProperName
) C

请参阅this db<>fiddle,它产生相同的结果并且可能具有完全相同的执行计划。

公用表表达式 (CTE) 可以用于类似的效果。

旁注:在 SQL Server 中,语法 SELECT Alias = expression 等价于 SELECT expression AS Alias。后者是标准 SQL,但有些人发现前者在允许的情况下更具可读性。

【讨论】: