SQL H2 重叠日期间隔聚合答案

【问题标题】：SQL H2 Overlapping date interval aggregateSQL H2 重叠日期间隔聚合
【发布时间】：2021-09-03 08:00:25
【问题描述】：

我有一个使用 H2 语法的 SQL 查询，它返回我正在使用的组件的约会列表。这不考虑使用相同 component_type_id 的重叠约会。

每个区块都是使用不同组件的约会。例子：总共有 x4 component_type_id = 33。

约会 1 在 08:00 到 09:30 之间，使用 x3 component_type_id = 33。
约会 2 在 09:00 到 11:00 之间，使用 x1 component_type_id = 33。 这意味着在 09:00 到 09:30 之间，所有 x4 资源都不可用。我想要那个时间框架。

component_type_id=33 的约会：

08:00 -----               <-|
08:30 |   |                 | 9 total machines are taken during this timeframe
09:00 |x3 | -----           | 
09:30 ----- |   | -----     |
10:00       |   | |   |     |                       
10:30       |x1 | |x0 |   <-|
11:00       ----- |   |
11:30             -----

应该返回：

08:00                     
08:30                     
09:00 --------------      <-| All x4 component_type_id=33 are taken
09:30 --------------      <-|
10:00                                        
10:30                       
11:00       
11:30

简化表格：

任命：

| id  | start_time           | end_time             |
| --- | ------------------- | ------------------- |
| 1   | 2021-05-21 8:00:00  | 2021-05-21 09:30:00 |
| 2   | 2021-05-21 09:00:00 | 2021-05-21 11:00:00 |
| 3   | 2021-05-21 09:30:00 | 2021-05-21 11:30:00 |

组件：

| id  | component_type_id   | name |
| --- | ----------------| ---- |
| 1   | 4               | pc12 |
| 2   | 4               | pc13 |
| 3   | 4               | pc14 |
| 4   | 3               | vm45 |
| 5   | 3               | vm46 |
| 6   | 3               | vm47 |
| 7   | 1               | gg67 |
| 8   | 1               | gg68 |
| 9   | 1               | gg69 |
| 10  | 1               | gg70 |

APPOINTMENT_COMPONENT：（交叉表）

| id  | appointment_id| component_id|
| --- | --------------| ----------- |
| 1   | 1             | 3           |
| 2   | 1             | 4           |
| 3   | 1             | 5           |
| 4   | 2             | 1           |
| 5   | 2             | 2           |
| 6   | 2             | 3           |
| 7   | 2             | 7           |
| 8   | 3             | 4           |
| 9   | 3             | 5           |

非聚合查询（用于演示）：

select a.appointment_id, c.component_id,ct.component_type_id, a.appointment_start_time, a.appointment_end_time,
                    (select count(*) from component where component.component_id = c.component_id and component.component_type_id = 33) as total_used_component_type,
                    (select count(*) from component where component.component_type_id = 33) as total_component_type from  appointment a
    left join appointment_component ac on a.appointment_id = ac.appointment_id
    inner join component c on c.component_id = ac.component_id
    inner join component_type ct on ct.component_type_id = c.component_type_id
    and a.appointment_start_time >= '2021-05-21T08:00:00.000-05:00' and a.appointment_end_time <= '2021-05-21T16:00:00.000-05:00' and c.component_type_id=33

结果：

我稍后会执行 TOTAL_COMPONENT_TYPE - TOTAL_USED_COMPONENT_TYPE 来检查有多少组件可用。 我的目标是在两个日期之间找到不可用的组件类型。

聚合查询：

select appointment_id, (TOTAL_COMPONENT_TYPE - SUM(TOTAL_USED_COMPONENT_TYPE)) AS TOTAL_AVAILABLE_COMPONENT_TYPE FROM (
                select a.appointment_id, c.component_id,ct.component_type_id, a.appointment_start_time, a.appointment_end_time,
                (select count(*) from component where component.component_id = c.component_id and component.component_type_id = 33) as total_used_component_type,
                (select count(*) from component where component.component_type_id = 33) as total_component_type from  appointment a
left join appointment_component ac on a.appointment_id = ac.appointment_id
inner join component c on c.component_id = ac.component_id
inner join component_type ct on ct.component_type_id = c.component_type_id
and a.appointment_start_time >= '2021-05-21T08:00:00.000-05:00' and a.appointment_end_time <= '2021-05-21T16:00:00.000-05:00' and c.component_type_id=33) 
GROUP BY Appointment_id

结果：

【问题讨论】：

您能否解释一下组件 id 3 如何同时与约会 id 1 和 3 相关联，如表 APPOINTMENT_COMPONENT 所示？约会 1 从 8:00 开始，在 9:30 结束……约会 2 从 9:00 开始，在 11:00 结束。也就是说，同一组件的两个约会同时（从 9:00 到 9:30）。
那是我的错！它们不能重叠。

标签： java sql date h2 overlap

【解决方案1】：

我认为您可以使用此查询获得所需的结果：

-- My goal is to find unavailable component types between two dates
select 
  busy_components.component_type_id, 
  busy_components.start_time, 
  busy_components.end_time, 
  busy_components.total 
from 
(
  select 
    cm.*, 
    greatest(ap.start_time, '2021-05-21 08:00:00'::timestamp) start_time, 
    sum(case when ctrl.event_name = 'start' then 1 else -1 end) over(partition by cm.component_type_id order by case when ctrl.event_name = 'start' then ap.start_time else ap.end_time end, ctrl.event_name, cm.id) total, 
    lead(case when ctrl.event_name = 'end' then ap.end_time end, 1, least(ap.end_time, '2021-05-21 16:00:00'::timestamp)) over(partition by cm.component_type_id order by case when ctrl.event_name = 'start' then ap.start_time else ap.end_time end, ctrl.event_name, cm.id) end_time 
  from (select 'start' event_name union select 'end' event_name) ctrl 
  inner join appointment ap on 1=1 
    and ap.start_time < '2021-05-21 16:00:00' 
    and ap.end_time   > '2021-05-21 08:00:00' 
  inner join appointment_component ac on 1=1 
    and ac.appointment_id = ap.id 
  inner join component cm on 1=1 
    and cm.id = ac.component_id 
) busy_components 
inner join 
(
  select 
    component_type_id, 
    count(*) total 
  from component 
  group by component_type_id
) all_components 
  on  all_components.component_type_id = busy_components.component_type_id 
  and all_components.total = busy_components.total 
;

您可以查看 PostgreSQL fiddle here，它针对以下数据运行并提供以下输出：

创建表约会（id，start_time，end_time）为选择 1, '2021-05-21 08:00:00'::timestamp, '2021-05-21 09:30:00'::timestamp union all 选择 2, '2021-05-21 09:00:00'::timestamp, '2021-05-21 11:00:00'::timestamp union all 选择 3, '2021-05-21 09:30:00'::timestamp, '2021-05-21 11:30:00'::timestamp ; 创建表组件（id，component_type_id，name）为选择 1, 4, 'pc12' union all 选择 2, 4, 'pc13' union all 选择 3, 4, 'pc14' union all 选择 4, 3, 'vm45' union all 选择 5, 3, 'vm46' union all 选择 6, 3, 'vm47' union all select 7, 1, 'gg67' union all select 8, 1, 'gg68' union all 选择 9, 1, 'gg69' union all 选择 10, 1, 'gg70' ; 创建表约会组件（ID，约会ID，组件ID）为选择 1, 1, 3 联合所有选择 2, 1, 4 联合所有选择 3, 1, 5 联合所有选择 4, 2, 1 联合所有选择 5, 2, 2 联合所有选择 6, 3, 3 联合所有选择 7, 2, 7 联合所有选择 8, 3, 4 联合所有选择 9, 3, 5 ; |组件类型 ID |开始时间 |结束时间 |总计| |--------------------|----------------------|------ ----------------|--------| | 4 | 2021-05-21T09:00:00Z | 2021-05-21T09:30:00Z | 3 | | 4 | 2021-05-21T09:30:00Z | 2021-05-21T11:00:00Z | 3 |

基本上，它是这样工作的：

首先找到在所需开始时间和结束时间之间运行的所有组件（其中start_time 在所需结束时间之前，end_time 在所需开始时间之后）。
接下来，将相同component_type_id 的组件分组并对其start_time 和end_time 进行排序，因为两列都在一个列中。这样，您可以在找到开始事件时累积计数 +1 个忙碌组件，在找到结束事件时累积计数 -1 个忙碌组件。
最后，检查该计数是否与 component_type_id 的现有组件数量相匹配。

对于匹配上述所有步骤的行，生成的开始时间将是所需开始时间和约会start_time 之间的最大开始时间。此外，最终的结束时间将是所需结束时间或以下end_time 之间的最短时间，其中组件可用。这样，您的结果将永远不会超出所需的开始和结束时间。

我认为您应该在此查询中更改的唯一一点是将字符串日期时间转换为时间戳，因为我为 PostgreSQL fiddle 编写了此查询，但这是一个很小的更改。

【讨论】：