【问题标题】:Show customer spend per day and whether they have spent the previous day (SQL)显示客户每天的花费以及他们是否在前一天花费过 (SQL)
【发布时间】:2019-10-01 20:57:05
【问题描述】:

我正在尝试为每天花费的每个客户创建一个新行,以及一个指示他们前一天是否花钱的列。如果客户每天消费两次,他们的表中仍然只有 1 行。如果客户在前一天花钱,那么它会显示为 TRUE。

这是下面的原始表格:

+---------------------+-------------+-----------------+
| datetime            | customer_id | amount          |
+---------------------+-------------+-----------------+
| 2018-03-01 03:00:00 | 3786        | 14.00000        |
| 2018-03-02 17:00:00 | 5678        | 25.00000        |
| 2018-07-09 18:00:00 | 5647        | 1000.99000      |
| 2018-08-17 19:00:00 | 5267        | 45.00000        |
| 2018-08-25 08:00:00 | 3456        | 78.00000        |
| 2018-08-25 17:00:00 | 3456        | 25.00000        |
| 2018-08-26 03:00:00 | 3456        | 34.90000        |
| 2019-02-03 08:00:00 | 3468        | 0.00000         |
| 2019-03-09 06:00:00 | 1111        | 100.00000       |
| 2019-05-25 14:00:00 | 3456        | 15.00000        |
| 2019-07-02 14:00:00 | 88889       | 45.00000        |
| 2019-07-04 03:00:00 | 8979        | 9.00000         |
| 2019-07-09 14:00:00 | 4567        | 9.99000         |
| 2019-08-25 08:00:00 | 1234        | 88.00000        |
| 2019-08-30 09:31:00 | 1234        | 30.00000        |
| 2019-08-30 12:00:00 | 9876        | 55.00000        |
| 2019-09-01 13:00:00 | 88889       | 23.00000        |
+---------------------+-------------+-----------------+

这是 CREATE 语句:

CREATE TABLE IF NOT EXISTS `spend` ( `datetime` datetime NOT NULL, `customer_id` int(11) NOT NULL, `amount` decimal(10, 5) NOT NULL, PRIMARY KEY (`datetime`)) DEFAULT CHARSET=utf8mb4;
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-03-01 03:00:00', 3786, 14.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-03-02 17:00:00', 5678, 25.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-07-09 18:00:00', 5647, 1000.99000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-17 19:00:00', 5267, 45.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-25 08:00:00', 3456, 78.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-25 17:00:00', 3456, 25.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-26 03:00:00', 3456, 34.90000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-02-03 08:00:00', 3468, 0.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-03-09 06:00:00', 1111, 100.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-05-25 14:00:00', 3456, 15.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-07-02 14:00:00', 88889, 45.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-07-04 03:00:00', 8979, 9.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-07-09 14:00:00', 4567, 9.99000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-08-25 08:00:00', 1234, 88.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-08-30 09:31:00', 1234, 30.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-08-30 12:00:00', 9876, 55.00000);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-09-01 13:00:00', 88889, 23.00000);

这是我目前得到的:

SELECT CAST(datetime AS DATE) AS day, 
       COUNT(DISTINCT customer_id) AS daily_spend,
FROM spend
WHERE amount is not NULL
ORDER BY date;

此代码目前无法正常工作,但我正在尽力修复它。

我浏览了一些帖子,但我能找到的最接近的是:count transaction per day

我正在尝试生成一个如下所示的表格:

+------------+-------------+--------------------+
| day        | customer_id | spent_previous_day |
+------------+-------------+--------------------+
| 2018-03-01 | 3786        | FALSE              |
+------------+-------------+--------------------+
| 2018-03-02 | 5678        | FALSE              |
+------------+-------------+--------------------+
| 2018-07-09 | 5647        | FALSE              |
+------------+-------------+--------------------+
| 2018-08-17 | 5267        | FALSE              |
+------------+-------------+--------------------+
| 2018-08-25 | 3456        | FALSE              |
+------------+-------------+--------------------+
| 2018-08-26 | 3456        | TRUE               |
+------------+-------------+--------------------+
| 2019-02-03 | 3468        | FALSE              |
+------------+-------------+--------------------+
| 2019-03-09 | 1111        | FALSE              |
+------------+-------------+--------------------+
| 2019-05-25 | 3456        | FALSE              |
+------------+-------------+--------------------+
| 2019-07-02 | 88889       | FALSE              |
+------------+-------------+--------------------+
| 2019-07-04 | 8979        | FALSE              |
+------------+-------------+--------------------+
| 2019-07-09 | 4567        | FALSE              |
+------------+-------------+--------------------+
| 2019-08-25 | 1234        | FALSE              |
+------------+-------------+--------------------+
| 2019-08-30 | 1234        | FALSE              |
+------------+-------------+--------------------+
| 2019-08-30 | 9876        | FALSE              |
+------------+-------------+--------------------+
| 2019-09-01 | 88889       | FALSE              |
+------------+-------------+--------------------+

编辑: 这是我正在使用的当前代码,基于我收到的建议。

select customer_id, CAST(datetime AS DATE) AS day,
      max(date(datetime))  over (partition by customer_id 
                          order by CAST(datetime AS DATE)
                          range between interval 1 day preceding and interval 1 day preceding
                         ) is not null AS spent_previous_day
from spend

这是结果表:

+------------+-------------+--------------------+
| day        | customer_id | spent_previous_day |
+------------+-------------+--------------------+
| 2019-03-09 | 1111        | 0                  |
+------------+-------------+--------------------+
| 2019-08-25 | 1234        | 0                  |
+------------+-------------+--------------------+
| 2019-08-30 | 1234        | 0                  |
+------------+-------------+--------------------+
| 2018-08-25 | 3456        | 0                  |
+------------+-------------+--------------------+
| 2018-08-25 | 3456        | 0                  |
+------------+-------------+--------------------+
| 2018-08-26 | 3456        | 1                  |
+------------+-------------+--------------------+
| 2019-05-25 | 3456        | 0                  |
+------------+-------------+--------------------+
| 2019-02-03 | 3468        | 0                  |
+------------+-------------+--------------------+

我尝试过GROUP BY day, customer_id,但出现错误。

【问题讨论】:

  • 提示:lag()窗口解析函数

标签: mysql sql mysql-8.0


【解决方案1】:

假设客户不会在同一天进行多次购买,只需使用lag()

select t.*,
       ( date(lag(datetime) over (partition by customer_id order by datetime)) = date(datetime) - interval 1 day
       ) as prev_day_flag
from spend t;

如果你可以有重复,那么试试这个而不是lag()

max(date(datetime)) over (partition by customer_id
                          order by date(datetime) 
                          range between interval 1 day preceding and interval 1 day preceding
                         ) is not null

编辑:

如果您希望每位客户每天一行:

select s.*,
       ( date(lag(dte) over (partition by customer_id order by dte)) = dte - interval 1 day
       ) as prev_day_flag
from (select customer_id, date(datetime) as dte, sum(amount) as amount
      from spend s
      group by customer_id, date(datetime)
     ) s;

【讨论】:

  • 为什么在两边都使用interval 1 day
  • @JuanCarlosOropeza 。 . .只需要前一天,这是定义它的便捷方式。这将适用于同一客户在同一天的多条记录。
  • 我已经尝试了这两个建议,它们都部分有效。但是,我仍在努力解决每日重复支出的问题。
  • 做一个group by day, customer_id 这样你每天只有一条记录并且可以使用第一个查询。
  • @ToniIdowu 。 . .这取决于你想要什么。如果您希望每个客户每天一行,那么请遵循 Juan 的建议并在该级别汇总数据。您的问题表明您想要原始数据中的所有行。
猜你喜欢
  • 1970-01-01
  • 2016-11-18
  • 1970-01-01
  • 1970-01-01
  • 2023-02-10
  • 2020-01-22
  • 2014-12-06
  • 1970-01-01
  • 2022-01-08
相关资源
最近更新 更多