【问题标题】:Running sum with gaps有间隙的运行总和
【发布时间】:2017-10-17 14:31:37
【问题描述】:

我有下表,除了黄色的列之外的所有列

基本上,该表包含客户的 ID、销售发生的日期以及客户当天花费的总金额(销售额)。现在我必须计算当天每个客户在一个时间范围内的累计销售额,包括当天的销售额。例如,将时间范围设置为 3 天客户 2233 购买了两次(14 日没有购买),因此他 15 日的累计销售额为 26,而 13 日为 25。

我无法创建新表,所以我尝试了这种方法,但速度很慢

SELECT t.dt,

Count(CASE WHEN t.running_sale < 1.99 THEN 1 ELSE NULL END) as "Low spender",
Count(CASE WHEN t.running_sale BETWEEN 1.99 and 4.99 THEN 1 ELSE NULL END) as "Medium spender",
Count(CASE WHEN t.running_sale > 4.99 THEN 1 ELSE NULL END) as "High spender"

FROM  ( SELECT dt, channel, id, (
     SELECT SUM(revenue)
     FROM  myTable rd
     WHERE CAST(rd.dt AS DATE) 
             BETWEEN (CAST(rd.dt AS DATE) - INTERVAL '3' DAY) AND CAST(rd.dt AS DATE) AND 
           rd.id = r.id 
  ) running_sale from myTable r) t

WHERE channel = 'retail' 
AND dt BETWEEN '2017-06-01' AND '2017-06-15'

GROUP BY dt
limit 100

【问题讨论】:

  • 使用分析? sum(Sales) OVER (PARTITION by ID ORDER BY Date asc ROWS BETWEEN 2 PRECEDING ) as RunningSales
  • 不起作用,因为将在第 12 天占用 ID 2233 将占用 11 和 06,这是超过 3 天的差距。
  • 我有点明白,但我不明白为什么 2233 在 15 日有 26,那么如果范围是 3 天前(包括 15、14、13),这将给出 22 而不是 26。或者应该包含第 12 个,所以范围是 15,13,​​13,12?
  • RexTester:对于那些想尝试它的人:rextester.com/BJE9775 和我的失败尝试)我想我们可以将它加入一个数字表,这样所有日期都包括在内,计算出正在运行的销售额并过滤掉那些0 美元的销售额...
  • 不要过分关注示例,而应关注描述。我是手动完成的,这是一个错字:D

标签: sql postgresql


【解决方案1】:

我会为此使用子查询

select *,
  (
     select sum(sales)
     from your_table dd
     where cast(dd.dates as date) 
             between cast(your_table.dates as date) - interval '3' day and 
                     cast(your_table.dates as date) and 
           dd.id = your_table.id
  ) running_sales
from your_table

demo

上面的查询可以重写为更高效的查询,只需使用自连接和group by

 select dd.id, dd.dates, dd.sales, sum(d.sales) running_sales
 from your_table dd
 join your_table d on cast(d.dates as date) 
         between (cast(dd.dates as date) - interval '3' day) and cast(dd.dates as date) and 
       dd.id = d.id
 group by dd.id, dd.dates, dd.sales

group by demo

您可以考虑创建以下索引来支持上述查询:

create index ix_your_table on your_table(id, dates, sales)

【讨论】:

  • 用我必须做的完整扩展更新我的问题。貌似这个方法很慢,服务器超时
  • @PasqualeSada 好的,我已经将它改写成group by 版本,请立即测试并告诉我
【解决方案2】:
With CTE as (
    SELECT 1234 id, '2017-06-15' idate,9 sales from dual UNION ALL
    SELECT 2233 id, '2017-06-03' idate,20 sales from dual UNION ALL
    SELECT 2233 id , '2017-06-05' idate,4 sales from dual UNION ALL
    SELECT 2233 id , '2017-06-06' idate,1 sales from dual UNION ALL
    SELECT 2233 id , '2017-06-11' idate,8 sales from dual UNION ALL
    SELECT 2233 id , '2017-06-12' idate,4 sales from dual UNION ALL
    SELECT 2233 id, '2017-06-13' idate,21 sales from dual UNION ALL
    SELECT 2233 id, '2017-06-15' idate,1 sales from dual UNION ALL
    SELECT 2544 id , '2017-06-13' idate,9 sales from dual UNION ALL
    SELECT 2443 id, '2017-06-05' idate,3.5 sales from dual )

 ,cte2 as (
select cte.*, to_number(replace(idate,'-')) datekey from cte
 )
 --select * from cte2
--SELECT cte.*, sum(cte.Sales) OVER (PARTITION by ID ORDER BY cte.iDate asc ROWS 2 PRECEDING ) as RunningSales FROM CTE

 --select rownum rn from dual connect by prior
 ,pp as (
 SELECT to_number(dd+20170600) dkey
FROM   ( SELECT rownum dd
         FROM   dual
         CONNECT BY LEVEL <= 31 
       )
)
--select * from pp
,cc as (


select cte2.* ,pp.dkey 
from pp left join cte2 
on(cte2.datekey=pp.dkey)
)
select cc.* ,sum(cc.Sales) OVER (PARTITION by cc.ID ORDER BY cc.dkey asc ROWS 2 PRECEDING ) as RunningSales
from cc order by dkey asc ,id asc

【讨论】:

  • 它在 oracle 12c 上测试它。它的工作原理借鉴了构建数据湖维度的想法。
【解决方案3】:

如果每天最多有一次销售,那么最有效的方法可能是重复滞后:

select rd.*,
       (sales +
        (case when prev_date >= date - interval '2 day' then prev_sales else 0 end) + 
        (case when prev2_date >= date - interval '2 day' then prev2_sales else 0 end)
       ) as sales_3day
from (select rd.*,
             lag(date, 1) over (partition by id order by date) as prev_date,
             lag(date, 2) over (partition by id order by date) as prev_date2,
             lag(sales, 1) over (partition by id order by date) as prev_sales,
             lag(sales, 2) over (partition by id order by date) as prev_sales2
      from mytable rd
) rd;

一旦你有了这个值,剩下的只是结果的条件逻辑。

如果您在一个日期有多个销售,您可以通过在最里面的查询中聚合来轻松完成这项工作。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-10-20
    • 1970-01-01
    • 1970-01-01
    • 2019-12-21
    • 1970-01-01
    • 2015-06-18
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多