【问题标题】:Merging records from multiple rows in table sql server合并表sql server中多行的记录
【发布时间】:2012-04-10 16:41:22
【问题描述】:

我在t_resourcetable 中有一些脏资源使用记录,看起来像这样

resNo subres startdate enddate 1 2 2012-01-02 22:03:00.000 2012-01-03 00:00:00.000 1 2 2012-01-03 00:00:00.000 2012-01-04 00:00:00.000 1 2 2012-01-04 00:00:00.000 2012-01-04 16:23:00.000 1 3 2012-01-06 16:23:00.000 2012-01-06 22:23:00.000 2 2 2012-01-04 05:23:00.000 2012-01-06 16:23:00.000

我需要以这种方式合并那些脏行

resNo subres startdate enddate 1 2 2012-01-02 22:03:00.000 2012-01-04 16:23:00.000 1 3 2012-01-06 16:23:00.000 2012-01-06 22:23:00.000 2 2 2012-01-04 05:23:00.000 2012-01-06 16:23:00.000

这应该更新到同一个表。我有超过 40k 行,所以不能使用游标。请帮助我通过更优化的 sql 语句来清理它。

提供的解决方案没有遇到类似的场景

resNo subres startdate enddate 1 2 2012-01-02 22:03:00.000 2012-01-03 00:00:00.000 1 2 2012-01-03 00:00:00.000 2012-01-04 00:00:00.000 1 2 2012-01-04 00:00:00.000 2012-01-04 16:23:00.000 1 2 2012-01-14 10:09:00.000 2012-01-15 00:00:00.000 1 2 2012-01-15 00:00:00.000 2012-01-16 00:00:00.000 1 2 2012-01-16 00:00:00.000 2012-01-16 03:00:00.000 1 3 2012-01-06 16:23:00.000 2012-01-06 22:23:00.000 2 2 2012-01-04 05:23:00.000 2012-01-06 16:23:00.000

我需要以这种方式合并那些脏行

resNo subres startdate enddate 1 2 2012-01-02 22:03:00.000 2012-01-04 16:23:00.000 1 2 2012-01-14 10:09:00.000 2012-01-16 03:00:00.000 1 3 2012-01-06 16:23:00.000 2012-01-06 22:23:00.000 2 2 2012-01-04 05:23:00.000 2012-01-06 16:23:00.000

请帮我解决这个脏数据问题。

【问题讨论】:

  • 你用的是什么sql版本?
  • 解释一下你用什么条件过滤?
  • 合并的规则是什么?这是一个很好的起点。
  • 您要合并连续的时间间隔,还是合并具有相同 resNo/subres 的所有时间间隔?它们是一样的吗?
  • 是的,基于资源子资源,我需要结合特定的记录开始时间和结束时间以及俱乐部。并将其更新为一条记录,例如 resno 1 和 subres 2 从 3 开始在 1 行中合并

标签: c# sql sql-server-2008 tsql sql-server-2005


【解决方案1】:
MERGE INTO t_resourcetable AS TARGET
USING (
    SELECT
        resNo, subres,
        MIN(startdate) as startdate,
        MAX(enddate) as enddate
    FROM t_resourcetable
    GROUP BY resNo, subres
) AS SOURCE
ON TARGET.resNo = SOURCE.resNo
AND TARGET.subres = SOURCE.subres
AND TARGET.startdate = SOURCE.startdate
-- Set enddate on the first record in the group
WHEN MATCHED THEN
    UPDATE SET TARGET.enddate = SOURCE.enddate
-- Delete the remaining items
WHEN NOT MATCHED BY SOURCE THEN
    DELETE;

编辑:尊重间隔中的间隙:

MERGE INTO t_resourcetable AS TARGET
USING (
    -- Find the first item in each interval group
    SELECT
        resNo, subres, startdate,
        row_number() over (partition by resNo, subres order by startdate) as rn
    FROM t_resourcetable t1
    WHERE NOT EXISTS (
        -- No other intervals that intersect this from behind
        SELECT NULL
        FROM t_resourcetable t2
        WHERE t2.resNo = t1.resNo
        AND t2.subres = t1.subres
        AND t2.startdate < t1.startdate
        AND t2.enddate >= t1.startdate
    )
) AS SOURCE_start
INNER JOIN (
    -- Find the last item in each interval group
    SELECT
        resNo, subres, enddate,
        row_number() over (partition by resNo, subres order by startdate) as rn
    FROM t_resourcetable t1
    WHERE NOT EXISTS (
        -- No other intervals that intersect this from ahead
        SELECT NULL
        FROM t_resourcetable t2
        WHERE t2.resNo = t1.resNo
        AND t2.subres = t1.subres
        AND t2.startdate <= t1.enddate
        AND t2.enddate > t1.enddate
    )
) AS SOURCE_end
    ON SOURCE_start.resNo = SOURCE_end.resNo
    AND SOURCE_start.subres = SOURCE_end.subres
    AND SOURCE_start.rn = SOURCE_end.rn -- Join by row number
ON TARGET.resNo = SOURCE_start.resNo
AND TARGET.subres = SOURCE_start.subres
AND TARGET.startdate = SOURCE_start.startdate
-- Set enddate on the first record in the group
WHEN MATCHED THEN
    UPDATE SET TARGET.enddate = SOURCE_end.enddate
-- Delete the remaining items
WHEN NOT MATCHED BY SOURCE THEN
    DELETE;

结果:

resNo   subres   startdate          enddate
    1        2   2012-01-02 22:03   2012-01-04 16:23
    1        2   2012-01-14 10:09   2012-01-16 03:00
    1        3   2012-01-06 16:23   2012-01-06 22:23
    2        2   2012-01-04 05:23   2012-01-06 16:23

编辑:如果目标表存在并发编辑的风险,您可能需要添加HOLDLOCK 提示。这将防止任何主键违规错误,并稍微提高资源效率。 (感谢乔伊):

MERGE INTO t_resourcetable WITH (HOLDLOCK) AS TARGET
...

【讨论】:

【解决方案2】:

对于 SQL Server 2005,您可以执行以下操作:

create table #temp
(
  resNo int,
  subres int,
  enddate datetime,
  primary key (resNo, subres)
)

-- Store the values you need for enddate in a temp table
insert into #temp
select resNo, 
       subres,
       max(enddate) as enddate
from t_resourcetable
group by resNo, subres

-- Delete duplicates keeping the row with min startdate
delete T
from (
        select row_number() over(partition by resNo, subres order by startdate) as rn
        from t_resourcetable
     ) as T
where rn > 1

-- Set enddate where needed
update T set enddate = tmp.enddate
from t_resourcetable as T
  inner join #temp as tmp
    on T.resNo = tmp.resNo and
       t.subres = tmp.subres
where T.enddate <> tmp.enddate

drop table #temp

【讨论】:

    【解决方案3】:

    我会创建一个临时表。 现在您可以用新的和清理过的数据填充临时表。 我认为,您必须使用 resNo 和 subres 组合键并选择 min startdate 和 max enddate。

    至少,删除旧表中的所有数据,并用临时表中的数据填充。

    【讨论】:

    • 托比感谢您的回复,但我已经尝试过创建临时表并通过光标向前移动,但查询的性能太慢了
    • 哦,真的吗?我认为,sqlserver 的 40k 行不应该减慢服务器的速度;-) 你试过“dillenmeister”中的例子吗?
    • 我的客户端应用程序在 VBA 中,我在局域网环境中进行交互,它在局域网中变慢
    【解决方案4】:

    您可以先将结果存储在一个临时表中,如下所示:

    DECLARE @tmp TABLE
    (
        resNo INT, 
        subres INT, 
        startdate DATETIME, 
        enddate DATETIME
    )
    
    INSERT   @tmp
    SELECT   resNo, subres, MIN(startdate), MAX(enddate)
    FROM     t_resourcetable
    GROUP BY resNo, subres
    

    要更新t_resourcetable 表,您可以这样做:

    DELETE   t_resourcetable
    
    INSERT   t_resourcetable
    SELECT   * 
    FROM     @tmp
    

    并在事务中运行所有这些。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-06-11
      • 2019-02-04
      相关资源
      最近更新 更多