【问题标题】:Fill in gaps in year sequence in SQL Server在 SQL Server 中填补年份序列中的空白
【发布时间】:2020-12-23 09:03:49
【问题描述】:

我有一个表格,其中包含 AgePeriodYear 列。 Age 列始终以 0 开头,并且没有固定的最大值(在此示例中我使用 'Age' 0 到 30,但范围也可以是 0 到 100 等),值 Period 和 @ 987654326@ 只出现在特定年龄的特定行中。

然而,在Age 出现PeriodYear 的值时,会发生变化,因此解决方案应该是动态的。用正确的PeriodYear 填写NULL 值的最佳方法是什么?

我正在使用 SQL Server。

Age Period  Year
-----------------
0   NULL    NULL
1   NULL    NULL
2   NULL    NULL
3   NULL    NULL
4   NULL    NULL
5   NULL    NULL
6   NULL    NULL
7   NULL    NULL
8   NULL    NULL
9   NULL    NULL
10  NULL    NULL
11  NULL    NULL
12  NULL    NULL
13  NULL    NULL
14  NULL    NULL
15  NULL    NULL
16  NULL    NULL
17  NULL    NULL
18  NULL    NULL
19  NULL    NULL
20  NULL    NULL
21  46      2065
22  NULL    NULL
23  NULL    NULL
24  NULL    NULL
25  NULL    NULL
26  51      2070
27  NULL    NULL
28  NULL    NULL
29  NULL    NULL
30  NULL    NULL

结果应如下所示,PeriodYear 的数字应比 PeriodYear 的最后已知值增加和/或减少。

Age Period  Year      
-----------------
0   25      2044
1   26      2045
2   27      2046
3   28      2047
4   29      2048
5   30      2049
6   31      2050
7   32      2051
8   33      2052
9   34      2053
10  35      2054
11  36      2055
12  37      2056
13  38      2057
14  39      2058
15  40      2059
16  41      2060
17  42      2061
18  43      2062
19  44      2063
20  45      2064
21  46      2065
22  47      2066
23  48      2067
24  49      2068
25  50      2069
26  51      2070
27  52      2071
28  53      2072
29  54      2073
30  55      2074

这是我的问题的更新,因为我没有详细说明我的要求: 该解决方案应该能够处理AgePeriodYear 的不同组合。我的起点将始终是已知的AgePeriodYear 组合。但是,在我的示例中,Age = 21、Period = 46 和 Year = 2065(或 26|51|2070 作为第二种组合)的组合不是静态的。 Age = 21 的值可以是任何值,例如Period = 2 和 Year = 2021。无论组合(AgePeriodYear)是什么,解决方案都应该填补空白并完成从已知值向上和向下计数的序列PeriodYear。如果Period 值序列变为负数,则解决方案应尽可能返回NULL 值。

【问题讨论】:

  • 你的问题不是很清楚,你有什么逻辑来填时期和年份吗?
  • 感谢您的评论,是的,序列只是 +/-1 年和期间。我编辑了问题,添加了所需结果的示例

标签: sql sql-server datetime window-functions gaps-and-islands


【解决方案1】:

似乎你的年龄和年份总是相同的增量 所以

select age, isnull(period,age +25) Period,  isnull(year,age+44) year  
from yourtable 

或标准函数 coalesce(由 Gordon Linoff 建议)

select age, coalesce(period,age +25) Period,  coalesce(year,age+44) year  
from yourtable 

【讨论】:

  • age +25 Period吗?
  • 也添加了 ifnull
  • @SowmyadharGourishetty .. isnull (for sql-server) 再次感谢 .. 在某处投票
  • 我认为我们在 sql-server 中没有IFNULL,相当于ISNULL
  • 你应该使用coalesce() -- 这是标准函数。
【解决方案2】:

标签创建代码

create table yourtable ( AGE int ,  Period  int, Year int )

insert into  yourtable
Select  0    AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  1    AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  2    AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  3    AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  4    AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  5    AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  6    AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  7    AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  8    AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  9    AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  10   AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  11   AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  12   AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  13   AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  14   AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  15   AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  16   AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  17   AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  18   AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  19   AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  20   AS AGE , null  As Period   ,  null  As Year       UNION all 
Select  21   AS AGE ,46 As Period   ,2065    As Year       UNION all 
Select  22   AS AGE ,  null As Period   ,  null  As Year       UNION all 
Select  23   AS AGE ,  null As Period   ,  null  As Year       UNION all 
Select  24   AS AGE ,  null As Period   ,  null  As Year       UNION all 
Select  25   AS AGE ,   51  As Period   ,2070    As Year       UNION all 
Select  26   AS AGE ,  null As Period   ,  null  As Year       UNION all 
Select  27   AS AGE ,  null As Period   ,  null  As Year       UNION all 
Select  28   AS AGE ,  null As Period   ,  null  As Year       UNION all 
Select  29   AS AGE ,  null As Period   ,  null  As Year       UNION all 
Select  30   AS AGE ,  null As Period   ,  null  As Year      

**步骤**

  • 我们需要为 Period 和 year 获取一行非空值。
  • 使用年龄获取两个列的第一个值。
  • 现在只需添加相应的年龄列值并填写完整的表格。

修复序列的代码

;with tmp as 
  (select top 1 * from yourtable  where  Period is not null and  year is not null)
 update yourtable
  set Period =  (tmp.Period - tmp.age) + yourtable.age
  , year =  (tmp.year - tmp.age) + yourtable.age
  from yourtable , tmp

Declare @age int  ,@Year int  ,@Period int 

select @age = age , @Year = year - (age +1)  ,@Period  = Period- (AGE +1)
from yourtable where  Period is not null and  year is not null

update  yourtable
set Period =@Period + age
,Year =@year + age
 from yourtable 

【讨论】:

    【解决方案3】:

    您最终想要三个具有不同起始值的序列。然后你只需要计算一个偏移量并将其添加到age

    with cte as
     (
       select age
         ,max(period - age) over () + age as period -- adjusted period
         ,max(yr - age)     over () + age as yr     -- adjusted yr
       from #yourtable
     )
    select age
      -- If a Period value sequence becomes negative the solutions should return NULL
      ,case when period >0 then period end as period
      ,yr
    from cte
    

    fiddle

    【讨论】:

      【解决方案4】:

      --希望你能管理语法错误。但是在这种情况下,下面给出的一些逻辑应该可以工作,我们可以将句点作为原点来计算其他缺失值。祝你好运!

      declare @knownperiod int;
      declare @knownperiodage int;
      declare @agetop int;
      declare @agebottom int;
      
      @knownperiod = select top 1 period from table1 where period is not null
      @knownperiodage = select top 1 age from table1 where period is not null
      
      
      
      while(@knownperiodage >= 0)
      begin 
      @knownperiod = @knownperiod -1 ;
      @knownperiodage = @knownperiodage -1;
      update table1 set period = @knownperiod, year = YEAR(GetDate())+@knownperiod-1  where age = @knownperiodage
      end
      
      -- now for bottom age 
      @knownperiod = select top 1 period from table1 where period is null or year is null
      @knownperiodage = select top 1 age from table1 where period is null or year is null
      
      while(@knownperiodage <= (Select max(age) from table1))
      begin 
      @knownperiod = @knownperiod +1 ;
      @knownperiodage = @knownperiodage +1;
      update table1 set period = @knownperiod, year = YEAR(GetDate())+@knownperiod-1  where age = @knownperiodage
      end
      

      【讨论】:

        【解决方案5】:

        首先计算增量(年龄 -> 时期和年龄 -> 年)然后简单地将这些增量添加到年龄值的过程? 这假设年龄和时期以及年龄和年份之间的差异在各行中是一致的(只是有时没有填写)。

        因此,您可以使用以下方法首先计算增量(PeriodInc、YrInc),然后选择添加了增量的值(注意,如果 period 变为负数,则为 NULL)。

        ; WITH  PeriodInc AS (SELECT TOP 1 Period - Age AS PeriodInc FROM #yourtable WHERE Period IS NOT NULL),
                YrInc AS (SELECT TOP 1 Yr - Age AS YrInc FROM #yourtable WHERE Yr IS NOT NULL)
        SELECT      Age, 
                    CASE WHEN (Age + PeriodInc) >= 0 THEN (Age + PeriodInc) ELSE NULL END AS Period, 
                    Age + YrInc AS Yr
            FROM    #yourtable
                    CROSS JOIN PeriodInc
                    CROSS JOIN YrInc
        

        这里是DB_Fiddle 的代码

        【讨论】:

          【解决方案6】:

          此解决方案需要 4 个输入:

          1. @list_length --(整数)要生成的行数(最多 12^5=248,832)
          2. @start_age --(整数)开始年龄
          3. @start_period --(整数)开始周期
          4. @start_year --(整数)开始年份

          对于任何输入组合,此代码都会生成请求的输出。如果 Age 或 Year 计算为负数,则将其转换为 NULL。列表长度的当前限制可以增加到任何必要的值。众所周知,使用交叉应用的行创建 row_number 的技术在生成大序列时非常快。超过大约 500 行它总是比基于递归的 CTE 快。在较小的行数上,两种技术之间几乎没有性能差异。

          这里是匹配示例数据的代码和输出。

          输入

          declare
            @list_length            int=31,
            @start_age              int=21,
            @start_period           int=46,
            @start_year             int=2065;
          

          代码

          with
          n(n) as (select * from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) v(n)),
          tally_cte(n) as (
               select row_number() over (order by (select null)) 
               from n n1 cross join n n2 cross join n n3 cross join n n4 cross join n n5)
          select p.Age, 
                 case when p.[Period]<0 then null else p.[Period] end [Period], 
                 case when p.[Year]<0 then null else p.[Year] end [Year]
          from tally_cte t
               cross apply
               (select (t.n-1) [Age], (t.n-1)+(@start_period-@start_age) [Period],
                       (t.n-1)+(@start_year-@start_age) [Year]) p
          where n<=@list_length;
          

          输出

          Age Period  Year
          0   25  2044
          1   26  2045
          2   27  2046
          3   28  2047
          4   29  2048
          5   30  2049
          6   31  2050
          7   32  2051
          8   33  2052
          9   34  2053
          10  35  2054
          11  36  2055
          12  37  2056
          13  38  2057
          14  39  2058
          15  40  2059
          16  41  2060
          17  42  2061
          18  43  2062
          19  44  2063
          20  45  2064
          21  46  2065
          22  47  2066
          23  48  2067
          24  49  2068
          25  50  2069
          26  51  2070
          27  52  2071
          28  53  2072
          29  54  2073
          30  55  2074
          

          假设 Period 和 Year 都小于起始 Age。当计算值为负时,该值将替换为 NULL。

          输入

          declare
            @list_length            int=100,
            @start_age              int=10,
            @start_period           int=5,
            @start_year             int=8;
          

          输出

          Age Period  Year
          0   NULL    NULL
          1   NULL    NULL
          2   NULL    0
          3   NULL    1
          4   NULL    2
          5   0       3
          6   1       4
          7   2       5
          8   3       6
          9   4       7
          10  5       8
          11  6       9
          12  7       10
          ...
          99  94      97
          

          Imo 这是满足所有要求的灵活有效的方式。如果有任何问题,请告诉我。

          【讨论】:

            【解决方案7】:

            这读起来像是一个间隙和孤岛问题,其中“空”行是间隙,非空行是孤岛。

            您想填补空白。您的问题有点棘手,因为您没有清楚地描述当间隙行同时具有前后岛时如何进行 - 以及如果它们不一致该怎么办。

            假设您想从下一个岛(如果有可用的岛)中获取值,然后回退到前一个岛。

            这是一种使用横向连接来检索下一个和前一个非空行的方法:

            select t.age, 
                coalesce(t.period, n.period - n.diff, p.period - p.diff) period,
                coalesce(t.year,   n.year   - n.diff, p.year   - p.diff) year
            from mytable t
            outer apply (
                select top (1) t1.*, t1.age - t.age diff
                from mytable t1 
                where t1.age > t.age and t1.period is not null and t1.year is not null
                order by t1.age 
            ) n
            outer apply (
                select top (1) t1.*, t1.age - t.age diff
                from mytable t1 
                where t1.age < t.age and t1.period is not null and t1.year is not null
                order by t1.age desc
            ) p
            order by t.age
            

            实际上,使用窗口函数可能会更有效地执行此操作。我们可以通过构建具有窗口计数的记录组来实现完全相同的逻辑,然后在组内进行计算:

            select
                age,
                coalesce(
                    period,
                    max(period) over(partition by grp2) - max(age) over(partition by grp2) + age,
                    max(period) over(partition by grp1) - min(age) over(partition by grp1) + age
                ) period,
                coalesce(
                    year,
                    max(year) over(partition by grp2) - max(age) over(partition by grp2) + age,
                    max(year) over(partition by grp1) - min(age) over(partition by grp1) + age
                ) year
            from (
                select t.*, 
                    count(period) over(order by age) grp1, 
                    count(period) over(order by age desc) grp2
                from mytable t
            ) t
            order by age
            

            Demo on DB Fiddle - 两个查询都产生:

            年龄 |期间 |年 --: | -----: | ---: 0 | 25 | 2044 1 | 26 | 2045 2 | 27 | 2046 3 | 28 | 2047 4 | 29 | 2048 5 | 30 | 2049 6 | 31 | 2050 7 | 32 | 2051 8 | 33 | 2052 9 | 34 | 2053 10 | 35 | 2054 11 | 36 | 2055 12 | 37 | 2056 13 | 38 | 2057 14 | 39 | 2058 15 | 40 | 2059 16 | 41 | 2060 17 | 42 | 2061 18 | 43 | 2062 19 | 44 | 2063 20 | 45 | 2064 21 | 46 | 2065 22 | 47 | 2066 23 | 48 | 2067 24 | 49 | 2068 25 | 50 | 2069 26 | 51 | 2070 27 | 52 | 2071 28 | 53 | 2072 29 | 54 | 2073 30 | 55 | 2074

            【讨论】:

              【解决方案8】:

              您还可以使用递归 CTE(它可以处理表中的任何数据变化,除了根本没有填充期间和年份的数据):

              WITH cte AS ( -- get any filled period and year
                  SELECT TOP 1 period - age delta,
                               [year]-period start_year
                  FROM tablename
                  WHERE period is not null and [year] is not null
              ), seq AS ( --get min and max age values
                  SELECT MIN(age) as min_age, MAX(age) as max_age
                  FROM tablename
              ), go_recursive AS (
                  SELECT min_age age,
                         min_age+delta period ,
                         start_year+min_age+delta year,
                         max_age
                  FROM seq
                  CROSS JOIN cte --That will generate the initial first row
                  UNION ALL
                  SELECT age + 1,
                         period +1,
                         year + 1,
                         max_age
                  FROM go_recursive 
                  WHERE age < max_age --This part increments the data from first row
              )
                
              SELECT age,
                     period,
                     [year] 
              FROM go_recursive 
              OPTION (MAXRECURSION 0) 
              -- If you know there are some limit of rows in that kind of tables 
              --use this row count instead 0
              

              【讨论】:

                猜你喜欢
                • 1970-01-01
                • 2020-09-25
                • 1970-01-01
                • 2020-01-20
                • 2021-02-04
                • 1970-01-01
                • 2019-02-05
                • 1970-01-01
                相关资源
                最近更新 更多