【问题标题】:Add missing monthly rows添加缺少的每月行
【发布时间】:2020-05-07 18:21:48
【问题描述】:

例如,我想在请求中列出两个日期之间缺少的日期

我的数据:

YEAR_MONTH  | AMOUNT    
202001  |  500    
202001  |  600    
201912  |  100    
201910  |  200
201910  |  100     
201909  |  400
201601  | 5000

我希望请求返回

201912  |  100    
201911  |    0    
201910  |  300
201909  |  400     
201908  |    0
201907  |    0
201906  |    0
....    |    0
201712  |    0

我想要从执行之日起的最后 24 个月

我对日期做了类似的事情,但不是 YEAR MONTH yyyyMM

select date_sub(s.date_order ,nvl(d.i,0)) as date_order, case when d.i > 0 then 0 else s.amount end as amount
from
(--find previous date
select date_order, amount, 
        lag(date_order) over(order by date_order) prev_date,
        datediff(date_order,lag(date_order) over(order by date_order)) datdiff
from
( --aggregate
 select date_order, sum(amount) amount from your_data group by date_order )s
)s
--generate rows
lateral view outer posexplode(split(space(s.datdiff-1),' ')) d as i,x
order by date_order;

我使用带有 Apache Hive 连接器的 Cassandra 数据库

有人可以帮我吗?

【问题讨论】:

    标签: sql hive cassandra hiveql date-range


    【解决方案1】:

    因此,如果我理解正确,您想添加当前缺少的所有日期,因为这些天 amount 恰好为 0。

    你可以用这个:

    select adddate('1970-01-01',t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) base_date from
        (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
        (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
        (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
        (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
        (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4
        having base_date between curdate() - interval 24 month and curdate();
    

    这基本上会创建一个 1970 年到 2200 年之间的日期列表(针对您感兴趣的日期进行过滤)。

    想法是从中选择一个子查询并与手头的表连接(在日期字段上)。

    示例:return empty rows for not existsting data

    至于日期格式(YEAR MONTH YYYYMM)你可以这样运行:

    DATE_FORMAT(your_date,'%Y%m')
    

    【讨论】:

      【解决方案2】:

      date_range 子查询从当前日期返回 24 个月(如果您想要 24 个月以外的范围,请调整)。将其与您的数据集左连接,请参阅此演示代码中的 cmets:

      with date_range as 
      (--this query generates months range, check it's output
      select date_format(add_months(concat(date_format(current_date,'yyyy-MM'),'-01'),-s.i),'yyyyMM') as year_month 
        from ( select posexplode(split(space(24),' ')) as (i,x) ) s --24 months
      ),
      
      your_data as (--use your table instead of this example
      select stack(7,
      202001, 500,    
      202001, 600,    
      201912, 100,    
      201910, 200,
      201910, 100,     
      201909, 400,
      201601,5000 -----this date is beyond 24 months, hence it is not in the output
      ) as (YEAR_MONTH, AMOUNT )
      )
      
      select d.year_month, sum(nvl(s.amount,0)) as amount --aggregate
        from date_range d 
             left join your_data s on d.year_month=s.year_month
        group by d.year_month;
      

      结果:

      d.year_month    amount
      201801  0
      201802  0
      201803  0
      201804  0
      201805  0
      201806  0
      201807  0
      201808  0
      201809  0
      201810  0
      201811  0
      201812  0
      201901  0
      201902  0
      201903  0
      201904  0
      201905  0
      201906  0
      201907  0
      201908  0
      201909  400
      201910  300
      201911  0
      201912  100
      202001  1100
      

      使用您的表代替 your_data 子查询。必要时添加order by

      【讨论】:

      • space(24) 是什么?在 Spark SQL 中找不到(无论是在 Scala 的函数对象中还是在 SQL 模式中)?
      • @JacekLaskowski 这是 Hive 函数,返回给定长度 (24) 的空格字符串
      • @JacekLaskowski in Scala/spark 我相信它应该是更优雅的生成行的方式。这是纯 Hive sql
      • 这个select posexplode(split(space(24),' ')) as (i,x)会生成从0开始的24个id吗?
      • @JacekLaskowski 是的
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-01-24
      • 1970-01-01
      • 1970-01-01
      • 2014-08-01
      • 1970-01-01
      • 2015-09-04
      • 1970-01-01
      相关资源
      最近更新 更多