【问题标题】:SQL: Rolling sum in the last 30 days by groupsSQL:过去 30 天的滚动总和(按组)
【发布时间】:2016-10-27 23:21:09
【问题描述】:

我有一张如下表:

date, custid, sales
2015-01-01, 01, 100
2015-01-10, 01, 200
2015-02-05, 01, 300
2015-03-02, 01, 400
2015-03-03, 01, 500
2015-01-01, 02, 100
2015-01-10, 02, 200
2015-02-05, 02, 300
2015-03-02, 02, 400
2015-03-03, 02, 500
...

如何按日期和 custid 生成过去 30 天的滚动总销售额。

期望的输出是:

date, custid, running_30_day_sales
2015-01-01, 01, 100
2015-01-10, 01, 300 --(100+200)
2015-02-05, 01, 500 --(200+300)
2015-03-02, 01, 700 --(300+400)
2015-03-03, 01, 1200 -- (300+400+500)
2015-01-01, 02, 100
2015-01-10, 02, 300 --(100+200)
2015-02-05, 02, 500 --(200+300)
2015-03-02, 02, 700 --(300+400)
2015-03-03, 02, 1200 -- (300+400+500)

【问题讨论】:

    标签: sql amazon-redshift


    【解决方案1】:

    这是使用self join 的一种方法。每个日期都与 datediff >0 且

    select a1.custid, a1.dt, a1.sales+sum(coalesce(a2.sales,0)) total
    from atable a1
    left join atable a2 on a1.custid=a2.custid 
    and datediff(day,a2.dt,a1.dt)<=30 and datediff(day,a2.dt,a1.dt)>0
    group by a1.custid,a1.dt,a1.sales
    order by 1,2
    

    Sample Demo in Postgres

    为了更好的理解,看一下self-join using的查询结果

    select a1.*,a2.*
    from atable a1
    left join atable a2 on a1.custid=a2.custid 
    and datediff(day,a1.dt,a2.dt)<=30 and datediff(day,a1.dt,a2.dt)>0
    

    【讨论】:

      【解决方案2】:

      这里有一个使用累积和的技巧:

      with t as (
            select custid, date, sales from atable
            union all
            select custid, date + interval '30 day', sales from atable
           )
      select custid, date,
             sum(sum(sales)) over (partition by cust_id order by date rows between unbounded preceding and current row) as sales_30day
      from t
      group by custid, date;
      

      【讨论】:

      • 我认为这会生成不在表中的行。
      • 嗨,戈登,感谢您的回复。你能加一点评​​论吗?此外,当我在 redshift 上运行查询时,出现错误“错误:42601:带有 ORDER BY 子句的聚合窗口函数需要框架子句”
      • @vip 。 . .是的,你是对的。这将具有值更改的所有日期。如果 OP 只想过滤原始数据中的日期,那么使用另一个连接就很容易了。
      【解决方案3】:

      你也可以使用窗口函数这样找到它

      SELECT custid, dt::date,
                  SUM(sales) OVER (partition by custid ORDER BY dt
                                  RANGE BETWEEN '30 days' PRECEDING AND '2 days' Following) as  sum_of_sales
                  MIN(sales) OVER (partition by custid ORDER BY dt::date
                                  RANGE BETWEEN '30 days' PRECEDING AND CURRENT ROW) as  minimum,
                  MAX(sales) OVER (partition by custid ORDER BY dt::date
                                  RANGE BETWEEN '2 days' PRECEDING AND '2 days' Following) as  maximum
            FROM atable
      

      【讨论】:

        猜你喜欢
        • 2020-06-29
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-02-11
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多