【问题标题】:Group by rows which are in sequence按顺序分组
【发布时间】:2022-01-04 11:31:35
【问题描述】:

考虑一下我有一张这样的桌子

PASSENGER  CITY      DATE
43         NEW YORK  1-Jan-21
44         CHICAGO   4-Jan-21
43         NEW YORK  2-Jan-21
43         NEW YORK  3-Jan-21
44         ROME      5-Jan-21
43         LONDON    4-Jan-21
44         CHICAGO   6-Jan-21
44         CHICAGO   7-Jan-21

如何按顺序对“乘客”和“城市”列进行分组以获得如下结果?

PASSENGER  CITY      COUNT
43         NEW YORK  3
44         CHICAGO   1
44         ROME      1
43         LONDON    1
44         CHICAGO   2

【问题讨论】:

  • 你的序列的order是如何定义的?关系表中没有显示顺序

标签: sql oracle gaps-and-islands


【解决方案1】:

处理这种差距和孤岛问题的一种方法是计算差距的排名。

然后也按该排名分组。

SELECT PASSENGER, CITY
, COUNT(*) AS "Count" 
-- , MIN("DATE") AS StartDate
-- , MAX("DATE") AS EndDate
FROM (
  SELECT q1.*
  , SUM(gap) OVER (PARTITION BY PASSENGER ORDER BY "DATE") as Rnk
  FROM (
    SELECT PASSENGER, CITY, "DATE"
    , CASE
      WHEN 1 = TRUNC("DATE")
             - TRUNC(LAG("DATE") 
                     OVER (PARTITION BY PASSENGER, CITY ORDER BY "DATE")) 
      THEN 0 ELSE 1 END as gap
    FROM table_name t
  ) q1
) q2
GROUP BY PASSENGER, CITY, Rnk
ORDER BY MIN("DATE"), PASSENGER
PASSENGER CITY Count
43 NEW YORK 3
43 LONDON 1
44 CHICAGO 1
44 ROME 1
44 CHICAGO 2

db小提琴here

【讨论】:

  • 嗨 Luk,如果我有时间戳而不是日期,你的方法会有什么变化?
  • 你试过时间戳吗?你看,查询已经被截断为 '00:00:00' 以标记间隙。因此,只要它仍然是大约 1 个不同的日子,这可能并不重要。
  • 是的,谷物不是 1 天,我也需要一天内的序列吗?
  • 或者您可以尝试自己更改间隙的计算。 F.e.检查timestamp difference 是否不超过 24 小时。
  • 我再问一个问题
【解决方案2】:

从 Oracle 12 开始,您可以使用MATCH_RECOGNIZE

SELECT *
FROM   table_name
MATCH_RECOGNIZE (
  PARTITION BY passenger
  ORDER     BY "DATE"
  MEASURES
    FIRST(city) AS city,
    COUNT(*)    AS count
  PATTERN (same_city+)
  DEFINE
    same_city AS FIRST(city) = city
);

其中,对于样本数据:

CREATE TABLE table_name (PASSENGER, CITY, "DATE") AS
SELECT 43, 'NEW YORK',  DATE '2021-01-01' FROM DUAL UNION ALL
SELECT 44, 'CHICAGO',   DATE '2021-01-04' FROM DUAL UNION ALL
SELECT 43, 'NEW YORK',  DATE '2021-01-02' FROM DUAL UNION ALL
SELECT 43, 'NEW YORK',  DATE '2021-01-03' FROM DUAL UNION ALL
SELECT 44, 'ROME',      DATE '2021-01-05' FROM DUAL UNION ALL
SELECT 43, 'LONDON',    DATE '2021-01-04' FROM DUAL UNION ALL
SELECT 44, 'CHICAGO',   DATE '2021-01-06' FROM DUAL UNION ALL
SELECT 44, 'CHICAGO',   DATE '2021-01-07' FROM DUAL

输出:

PASSENGER CITY COUNT
43 NEW YORK 3
43 LONDON 1
44 CHICAGO 1
44 ROME 1
44 CHICAGO 2

如果你已经对输入结果集进行了排序(注意:表应该被认为是无序的)并且想要保持顺序那么:

SELECT *
FROM   (SELECT t.*, ROWNUM AS rn FROM table_name t)
MATCH_RECOGNIZE (
  PARTITION BY passenger
  ORDER     BY RN
  MEASURES
    FIRST(rn)     AS rn,
    FIRST("DATE") AS "DATE",
    FIRST(city)   AS city,
    COUNT(*)      AS count
  PATTERN (same_city+)
  DEFINE
    same_city AS FIRST(city) = city
)
ORDER BY rn

输出:

PASSENGER RN DATE CITY COUNT
43 1 01-JAN-21 NEW YORK 3
44 2 04-JAN-21 CHICAGO 1
44 5 05-JAN-21 ROME 1
43 6 04-JAN-21 LONDON 1
44 7 06-JAN-21 CHICAGO 2

db小提琴here

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-06-19
    • 2021-12-17
    • 2013-09-13
    • 2011-06-08
    • 2019-04-05
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多