【问题标题】:Teradata partitioned query ... following rows dynamicallyTeradata 分区查询...动态跟踪行
【发布时间】:2014-09-29 19:44:32
【问题描述】:

我有一个包含以下列和数据的表格。数据描述了某些客户活动时期

cust_id    s_date       e_date
11111    01.03.2014   31.03.2014
11111    10.04.2014   30.04.2014
11111    01.05.2014   10.05.2014
11111    15.06.2014   31.07.2014
22222    01.04.2014   31.05.2014
22222    01.06.2014   30.06.2014
22222    01.07.2014   15.07.2014

我想写一个查询来给出这个结果:

cust_id    s_date       e_date
11111    01.03.2014   10.05.2014
11111    15.06.2014   31.07.2014
22222    01.04.2014   15.07.2014

查询结果的目的是当客户 IN-activity 周期小于 15 天时,将行“合并”为一行。我可以处理“前面的 1 行”,但如果需要合并 3 行或更多行,则它不起作用。我想不出如何编写这个查询。

我查询前的“一半”1 行:

SELECT cust_id
     , start_date     as current_period_start_date
     , end_date       as current_period_end_date
     , end_date+15    as current_period_expired_date
     , coalesce(
            min(current_period_expire_date)
           over(partition by cust_id
                    order by start_date
                     rows between 1 preceding and 1 preceding)
               , cast('1900-01-01' as date)) as previous_period_expire_date
     , case 
         when current_period_start_date <= previous_period_expire_date
         then min(current_period_start_date)
             over(partition by cust_id
                      order by start_date
                       rows between 1 preceding and current row)
         else current_period_start_date
       end as new_current_period_start_date

  FROM MY_DB.my_table
     . . .

另外,是否可以像这样将前置变成动态方式?

... over(partition by ... order by ... rows between X preceding and current row)

【问题讨论】:

    标签: sql teradata


    【解决方案1】:

    我会使用lag() 函数来解决这个问题。此函数可用于识别开始新期间的每一行。然后当这个标志被累积求和时,它提供一个组标识符。下面是代码的样子:

    select cust_id, min(s_date) as s_date, max(e_date) as e_date
    from (select t.*, sum(GroupStartFlag) over (partition by cust_id order by s_date rows unbounded preceding) as grpid
          from (select cust_id, s_date, e_date,
                       (case when s_date <= lag(e_date) over (partition by cust_id order by s_date) + 15
                             then 0
                             else 1
                        end) as GroupStartFlag
                from  MY_DB.my_table
               ) t
         ) t
    group by cust_id, grpid;
    

    注意:Teradata 支持窗口函数,但有时对它们有奇怪的要求。我认为上面的方法可以直接工作,但我没有系统可以测试它。

    编辑:

    我不确定 Teradata 是否支持 lag() 函数。您可以对相关子查询执行等效操作:

    select cust_id, min(s_date) as s_date, max(e_date) as e_date
    from (select t.*,
                 sum(case when s_date <= prev_edate + 15 then 0 else 1 end) over
                     (partition by cust_id order by s_date rows unbounded preceding) as grpid
          from (select cust_id, s_date, e_date,
                       (select max(e_date) 
                        from MY_DB.my_table t2
                        where t2.cust_id = t.cust_id and
                              t2.s_date < t.s_date
                       ) as prev_edate
                from  MY_DB.my_table t
               ) t
         ) t
    group by cust_id, grpid;
    

    【讨论】:

    • 另外在第二个查询中似乎有一个小错误(第 3 行)-> e_date 必须是 s_date :)。再次感谢
    • @MarkoMahl 。 . .谢谢你。这个错误实际上是在两个查询中。
    【解决方案2】:

    Gordon 的答案可以修改,因为基本 LAG 语法很容易重写:

    LAG(col, n) OVER (ORDER BY c) 
    

    和一个

    一样
    MIN(col) OVER (ORDER BY c ROWS BETWEEN n PRECEDING AND n PRECEDING)
    

    可能的默认值作为第三个参数可以使用 COALESCE(LAG...., 默认值) 来完成,只有 IGNORE NULLS 选项非常困难。

    这会导致:

    SELECT cust_id, MIN(s_date) AS s_date, MAX(e_date) AS e_date
    FROM (SELECT t.*, SUM(GroupStartFlag) OVER (PARTITION BY cust_id ORDER BY s_date ROWS UNBOUNDED PRECEDING) AS grpid
          FROM (SELECT cust_id, s_date, e_date,
                       (CASE WHEN s_date <= MIN(e_date) 
                                            OVER (PARTITION BY cust_id 
                                                  ORDER BY s_date
                                                  ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) + 15
                             THEN 0
                             ELSE 1
                        END) AS GroupStartFlag
                FROM  vt
               ) t
         ) t
    GROUP BY cust_id, grpid;
    

    如果您不需要任何其他列(仅 cust_id 和日期),您还可以使用特定的 TD 13.10 表函数来标准化期间。要包括 15 天的差异,您可以简单地减去/加上 15 天:

    WITH cte (cust_id, pd)
    AS 
     ( SELECT cust_id, PERIOD(s_date-15, e_date) AS pd
       FROM vt
     )
    SELECT cust_id,
       BEGIN(pd)+15,
       END(pd),
       cnt
    FROM TABLE (TD_NORMALIZE_OVERLAP_MEET
                (NEW VARIANT_TYPE(cte.cust_id)
                    ,cte.pd)
            RETURNS (cust_id INTEGER
                    ,pd PERIOD(DATE)
                    ,cnt INTEGER) --optional: number of rows normalized in one result row
            HASH BY cust_id
            LOCAL ORDER BY cust_id, pd
            ) AS t;
    

    在 TD 14.10 中,还有一个非常好的语法用于标准化周期:

    SELECT cust_id, BEGIN (pd)+15, END(pd) 
    FROM
     (
       SELECT NORMALIZE
          cust_id, PERIOD(s_date-15, e_date) AS pd
       FROM vt
     ) AS dt
    

    顺便说一句,周期定义为包含开始但排他结束(即,前一个周期的无间隙周期结束和下一个周期的开始具有相同的值),因此您可能必须将 15 更改为 16 才能获得所需结果。

    【讨论】:

      猜你喜欢
      • 2013-03-03
      • 1970-01-01
      • 1970-01-01
      • 2016-02-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-12-15
      相关资源
      最近更新 更多