【问题标题】:separate text in a column by a delimiter (|), and then stack all separated values into a single column without duplicates用分隔符 (|) 分隔列中的文本,然后将所有分隔值堆叠到单个列中,不重复
【发布时间】:2015-05-26 05:34:15
【问题描述】:

我有一个数据报告,它与工作卡一起输入到 SQL Server。
我正在根据这个 SQL 表创建一个报告,该表查看一个 Excel 报告并检查缺少哪些工作卡。
到目前为止,我制定了一种手动方法来修复 sql 表中的数据,以使用文本到列解开作业卡,然后堆叠列以创建一个巨大的列,但如果有一种方法可以自动化它会很好Sql 服务器。
例子: [第1列的每一行都是一行]

Column 1
A437|Bb7772|d763ch
D444r7|Z71|
A37|Bc7772|766ch

需要看起来像这样:

Column 1
A437
Bb7772
d763ch
D444r7
Z71
A37
Bc7772
766ch

创建新列后,我还将删除所有重复项(如果有)。

很抱歉给您添麻烦,但老实说,我什至不知道在 SQL 中从一开始就拆分列。
我想我可以使用 UNION all 函数将值堆叠到一个新列中。

哦,更复杂的是,分组的工作卡的数量是可变的(可能只是两个捆绑在一起,可能多达 6 个,可能只是一张工作卡)。

我在角落里,否则我什至都懒得问。是的,我公司的工作卡组织方法很糟糕。

【问题讨论】:

  • 这可能是我在 SQL 之外使用 Python 之类的脚本语言所做的事情。
  • 我同意 Michael 的观点,最好在 Sql 之外进行,或者使用 CLR 函数。

标签: sql sql-server delimiter


【解决方案1】:

来自my DBA post同一主题:

利用 Jeff Moden 的 Tally-Ho!来自 here 的 CSV 拆分器:

CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
     -- enough to cover VARCHAR(8000)
WITH E1(N) AS (
           SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
           SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
           SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
       ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
            SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() 
                                                        OVER (ORDER BY (SELECT NULL)) FROM E4
        ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just
                     -- once for each delimiter)
            SELECT 1 UNION ALL
            SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
        ),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
            SELECT s.N1,
                   ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
            FROM cteStart s
        )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final
     -- element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
        Item       = SUBSTRING(@pString, l.N1, l.L1)
   FROM cteLen l
;
go

我们可以将解决方案编码为针对 Jeff 函数的应用和如下所示的枢轴:

with data as (
    select Code,Location,Quantity,Store from ( values
        ('L698-W-EA',          NULL,                                      2, 'A')
       ,('L82009-EA',          'A1K2, A1N2, C4Y3, CBP2',                  2, 'A')
       ,('L80401-A-EA',        'A1S2, SHIP, R2F1, CBP5, BRP, BRP1-20',    17,'A')
       ,('CWD2132W-BOX-25PK',  'A-AISLE',                                 1, 'M')
       ,('GM22660003-EA',      'B1K2',                                    1, 'M')
    )data(Code,Location,Quantity,Store)
)
,shredded as (
    select Code,Location,Quantity,Store,t.*
    from data
    cross apply [dbo].[DelimitedSplit8K](data.Location,',') as t
)
select 
    pvt.Code,pvt.Quantity,pvt.Store
   ,cast(isnull(pvt.[1],' ') as varchar(8)) as Loc1
   ,cast(isnull(pvt.[2],' ') as varchar(8)) as Loc2
   ,cast(isnull(pvt.[3],' ') as varchar(8)) as Loc3
   ,cast(isnull(pvt.[4],' ') as varchar(8)) as Loc4
   ,cast(isnull(pvt.[5],' ') as varchar(8)) as Loc5 
   ,cast(isnull(pvt.[6],' ') as varchar(8)) as Loc6
from shredded
pivot (max(Item) for ItemNumber in ([1],[2],[3],[4],[5],[6])) pvt;
;
go

产生这个:

Code              Quantity    Store Loc1     Loc2     Loc3     Loc4     Loc5     Loc6
----------------- ----------- ----- -------- -------- -------- -------- -------- --------
L698-W-EA         2           A                                                   
L82009-EA         2           A     A1K2      A1N2     C4Y3     CBP2              
L80401-A-EA       17          A     A1S2      SHIP     R2F1     CBP5     BRP      BRP1-20
CWD2132W-BOX-25PK 1           M     A-AISLE                                       
GM22660003-EA     1           M     B1K2                                          

【讨论】:

    【解决方案2】:

    试试这个

    功能

     CREATE  FUNCTION [dbo].[fn_Split](@text varchar(8000), @delimiter varchar(20))
        RETURNS @Strings TABLE
        (   
          position int IDENTITY PRIMARY KEY,
          value varchar(8000)  
        )
        AS
        BEGIN
    
        DECLARE @index int
        SET @index = -1
    
        WHILE (LEN(@text) > 0)
          BEGIN 
            SET @index = CHARINDEX(@delimiter , @text) 
            IF (@index = 0) AND (LEN(@text) > 0) 
              BEGIN  
                INSERT INTO @Strings VALUES (@text)
                  BREAK 
              END 
            IF (@index > 1) 
              BEGIN  
                INSERT INTO @Strings VALUES (LEFT(@text, @index - 1))  
                SET @text = RIGHT(@text, (LEN(@text) - @index)) 
              END 
            ELSE
              SET @text = RIGHT(@text, (LEN(@text) - @index))
            END
          RETURN
        END
    

    查询

    select value from fn_split( (select stuff(( select '|'+Column1 from table1 for xml path('')),1,1,'')) ,'|')
    

    【讨论】:

      【解决方案3】:

      Sql Server 有很多字符串拆分功能。
      当您有一个简短的小字符串列表时,它们中的大多数性能会更好。
      您可以阅读this article,了解一些领先解决方案之间的性能测试。

      对于本示例,我将使用该文章中 Jeff Moden 的拆分器函数,但您应该选择最适合您需要的函数。

      --  Create the sample data
      CREATE TABLE MyTable (Column1 varchar(max))
      INSERT INTO MyTable VALUES 
      ('A437|Bb7772|d763ch'),
      ('D444r7|Z71|'),
      ('A37|Bc7772|766ch')
      
      -- Create the split function
      CREATE FUNCTION dbo.SplitStrings
      (
         @List NVARCHAR(MAX),
         @Delimiter NVARCHAR(255)
      )
      RETURNS TABLE
      WITH SCHEMABINDING AS
      RETURN
        WITH E1(N)        AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 
                               UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 
                               UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
             E2(N)        AS (SELECT 1 FROM E1 a, E1 b),
             E4(N)        AS (SELECT 1 FROM E2 a, E2 b),
             E42(N)       AS (SELECT 1 FROM E4 a, E2 b),
             cteTally(N)  AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@List,1))) 
                               ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E42),
             cteStart(N1) AS (SELECT t.N+1 FROM cteTally t
                               WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0))
        SELECT Item = SUBSTRING(@List, s.N1, ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,8000))
          FROM cteStart s;
      

      现在,对于实际的解决方案:

      DECLARE @AllValues varchar(max)
      
      -- Concatenate all the values in Column1 to a single string. 
      -- the replace function is to prevent a double delimiter in case of the value of any row begins or ends with the delimiter
      SELECT @AllValues = REPLACE(STUFF((
         SELECT '|'+ Column1
         FROM MyTable 
         FOR XML PATH('')
       ), 1, 1, ''), '||', '|')
      
      -- These are the distinct values:
      SELECT DISTINCT Item
      FROM dbo.SplitStrings(@AllValues, '|')
      

      现在,假设这个表只有一列,你可以这样做:

      -- get the values in the column
      SELECT @AllValues = REPLACE(STUFF((
         SELECT '|'+ Column1
         FROM MyTable 
         FOR XML PATH('')
       ), 1, 1, ''), '||', '|')
      
      -- delete all rows from the table
      TRUNCATE TABLE MyTable 
      
      -- insert new values
      INSERT INTO MyTable
      SELECT DISTINCT Item
      FROM dbo.SplitStrings(@AllValues, '|')
      

      Read here 找出我选择截断表而不是删除的原因

      【讨论】:

        【解决方案4】:
        DECLARE @t table(id int identity(1,1), name varchar(100))
        INSERT @t VALUES
        ('A437|Bb7772|d763ch'),
        ('D444r7|Z71'),
        ('A37|Bc7772|766ch')
        
        ;WITH Value AS
        (
             SELECT row_number() over(order by id) rn,t.c.value('.', 'VARCHAR(2000)') name
             FROM (
                 SELECT id, x = CAST('<t>' + 
                       REPLACE(name, '|', '</t><t>') + '</t>' AS XML)
                 FROM @t
             ) a
             CROSS APPLY x.nodes('/t') t(c)
        )
        SELECT DISTINCT name
        FROM Value 
        

        【讨论】:

          【解决方案5】:

          如果你的 Column1 总是像 '%|%|%' 使用这个查询:

          SELECT part 
          FROM (
              SELECT LEFT(column1, CHARINDEX('|', column1, 0) - 1) part
              FROM t
              UNION 
              SELECT SUBSTRING(column1, CHARINDEX('|', column1, 0) + 1, CHARINDEX('|', column1, CHARINDEX('|', column1, 0) + 1) - CHARINDEX('|', column1, 0) - 1)
              FROM t
              UNION 
              SELECT RIGHT(column1, CHARINDEX('|', REVERSE(column1), 0) - 1)
              FROM t) parts
          WHERE 
              part <> ''
          

          【讨论】:

          • 请注意,OP的示例在分隔符之间具有不同数量的值,并且还专门写了“分组的工作卡数量是可变的(可能只是两个捆绑在一起,可能多达6个,可以只是一张工作卡)。”
          • 感谢大家为此付出的努力,以及所有帮助我进一步发展的信息和资源。这是非常感谢的。我将尝试各种解决方案,如果希望结果成功,我会报告
          猜你喜欢
          • 2023-03-13
          • 2021-12-27
          • 2020-03-05
          • 2021-06-03
          • 1970-01-01
          • 2018-01-23
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多