SQL BigQuery：识别客户停止购买产品答案

【问题标题】：SQL BigQuery: Identifying customers to stopped buying the productsSQL BigQuery：识别客户停止购买产品
【发布时间】：2021-07-21 14:19:28
【问题描述】：

我有下表，其中我有 DISTINCT CustomerID 和 Date_Trunc(TIME,MONTH) 如下所示。 如果客户在同一个月内有很多交易，则它仅在我们的表中表示为 一条记录。

DATE	CustomerID
2021-01-01	111
2021-01-01	112
2021-02-01	111
2021-03-01	113
2021-03-01	115
2021-04-01	119

对于给定的月份，M，我想查看在 M-4（M 月前四个月）和 M-2（M 月前两个月）之间的任何时间从我们这里购买至少一次的不同 CustomerID，并且这些客户做了在 M-1 月（上个月）不购买。

基本上，如果我们查看第 6 个月，我希望所有在第 2 个月和第 4 个月之间从我们这里购买的不同客户（回顾 3 个月不包括上个月）但后来没有购买上个月（第 5 个月）。

我想要的输出是一个按 DATE（M 月）分组的表格，并显示曾经购买（M-4 和 M-2 之间）但在上个月停止购买的客户的 CustomerID（M- 1).

DATE (M)	CustomerID
2021-01-01	111
2021-01-01	114
2021-02-01	118
2021-02-01	113
2021-02-01	115
2021-03-01	119

【问题讨论】：

标签： sql google-bigquery

【解决方案1】：

您的数据已经只有一行按月和customerid，所以只需使用lag()：

select t.*
from (select t.*,
             lag(yyyymm) over (partition by customerid order by yyyymm) as prev_yyyymm
      from t
     ) t
where prev_yyyymm >= date_add(yyyymm, interval -4 month) and
      prev_yyyymm <= date_add(yyyymm, interval -2 month);

或者更简单地使用qualify：

select t.*
from t
where 1=1
qualify lag(yyyymm) over (partition by customerid order by yyyymm) >= date_add(yyyymm, interval -4 month) and
        lag(yyyymm) over (partition by customerid order by yyyymm) <= date_add(yyyymm, interval -2 month);

【讨论】：

非常感谢您的回答，但此方法将获取 M 月回来从我们这里购买的用户列表。但我正在寻找的更多的是......对于第 6 个月 => （第 2 个月 + 3 + 4 个月）中的唯一用户列表，然后我只想得到那些不是出现在第 5 个月

【解决方案2】：

使用下面的方法

select date, customerid
from (
  select *, 
    array_agg(customerid) over(order by pos range between 4 preceding and 2 preceding) bought_in_3_months_before_prev,
    array_agg(customerid) over(order by pos range between 1 preceding and 1 preceding) bought_in_prev,
  from (
    select *, date_diff(date, '2000-01-01', month) pos
    from `project.dataset.table`
  )
) t, unnest(array(
  select distinct id
  from t.bought_in_3_months_before_prev id
  where not id in (select * from t.bought_in_prev)
)) customerid

更新：
如果表中的数据量非常大导致内存/资源相关问题 - 使用以下方法

select * from (
  select date_add(date, interval offset month) as date, customerid
  from `project.dataset.table`, unnest([2,3,4]) offset 
  except distinct 
  select date_add(date, interval 1 month) as date, customerid
  from `project.dataset.table`
)
where date <= (select max(date) from `project.dataset.table`)

以防万一您对 except distinct 运算符不满意 - 您可以使用以下版本和更常见/传统的 union distinct - 两个版本都非常不言自明，因此更多的是偏好问题

select date, customerid from (
  select date_add(date, interval offset month) as date, customerid, true flag 
  from `project.dataset.table`, unnest([2,3,4]) offset 
  union distinct
  select date_add(date, interval 1 month) as date, customerid, false flag
  from `project.dataset.table`
)
where date <= (select max(date) from `project.dataset.table`)
group by date, customerid
having logical_and(flag)

【讨论】：