【问题标题】:Complex query analyzing historical records分析历史记录的复杂查询
【发布时间】:2021-05-06 12:17:38
【问题描述】:

我正在使用 Oracle 并试图检索一个人在一年中不在办公室的总天数。我有 2 张桌子:

状态

1 - Active
2 - Out of the Office
3 - Other

日程安排历史

RecordID - primary key
PersonID
PreviousStatusID
NextStatusID
DateChanged

我可以很容易地找到这个人什么时候去度假,什么时候回来,使用

SELECT DateChanged FROM ScheduleHistory WHERE PersonID=111 AND NextStatusID = 2

SELECT DateChanged FROM ScheduleHistory WHERE PersonID=111 AND PreviousStatusID = 2

但是如果一个人去度假不止一次,我如何计算一个人离开办公室的总天数。仅给定 PersonID,是否可以通过编程方式进行操作?

这是一些示例数据:

RecordID    PersonID    PreviousStatusID    NextStatusID    DateChanged
-----------------------------------------------------------------------------
1           111           1                     2              03/11/2020
2           111           2                     1              03/13/2020
3           111           1                     3              04/01/2020
4           111           3                     1              04/07/2020
5           111           1                     2              06/03/2020
6           111           2                     1              06/05/2020
7           111           1                     2              09/14/2020
8           111           2                     1              09/17/2020

所以从上面的数据来看,对于 PersonID 111 的 2020 年,查询应该返回 7

【问题讨论】:

  • 请提供样本数据和期望的结果。根据您的描述,您可能每天都有记录,然后将它们加起来。另外,超过一年的时间段呢?这些是如何处理的?
  • 这仅适用于指定的时间范围 - 1 年。我添加了示例数据
  • 你把周末算作不在办公室吗?
  • 目前不在范围内的周末

标签: sql oracle plsql


【解决方案1】:

试试这个:

with aux1 AS (
    SELECT
        a.*,
        to_date(datechanged, 'MM/DD/YYYY') - LAG(to_date(datechanged, 'MM/DD/YYYY')) OVER(
            PARTITION BY personid
            ORDER BY
                recordid
        ) lag_date
    FROM
        ScheduleHistory a
) 
SELECT
    personid,
    SUM(lag_date) tot_days_ooo
FROM
    aux1
WHERE
    previousstatusid = 2
GROUP BY
    personid;

【讨论】:

    【解决方案2】:

    如果您想要每年的总天数(或工作日)(并考虑超过年份边界的时间段),那么:

    WITH date_ranges ( personid, status, start_date, end_date ) AS (
      SELECT personid,
             nextstatusid,
             datechanged,
             LEAD(datechanged, 1, datechanged) OVER(
               PARTITION BY personid
               ORDER BY datechanged
             )
      FROM   table_name
    ),
    split_year_ranges ( personid, year, start_date, end_date, max_date ) AS (
      SELECT personid,
             TRUNC( start_date, 'YY' ),
             start_date,
             LEAST(
               end_date,
               ADD_MONTHS( TRUNC( start_date, 'YY' ), 12 )
             ),
             end_date
      FROM   date_ranges
      WHERE  status = 2
    UNION ALL
      SELECT personid,
             end_date,
             end_date,
             LEAST( max_date, ADD_MONTHS( end_date, 12 ) ),
             max_date
      FROM   split_year_ranges
      WHERE  end_date < max_date
    )
    SELECT personid,
           EXTRACT( YEAR FROM year) AS year,
           SUM( end_date - start_date ) AS total_days,
           SUM(
             ( TRUNC( end_date, 'IW' ) - TRUNC( start_date, 'IW' ) ) * 5 / 7
             + LEAST( end_date - TRUNC( end_date, 'IW' ), 5 )
             - LEAST( start_date - TRUNC( start_date, 'IW' ), 5 )
           ) AS total_weekdays
    FROM   split_year_ranges
    GROUP BY personid, year
    ORDER BY personid, year
    

    其中,对于样本数据:

    CREATE TABLE table_name ( RecordID, PersonID, PreviousStatusID, NextStatusID, DateChanged ) AS
    SELECT  1, 111, 1, 2, DATE '2020-03-11' FROM DUAL UNION ALL
    SELECT  2, 111, 2, 1, DATE '2020-03-13' FROM DUAL UNION ALL
    SELECT  3, 111, 1, 3, DATE '2020-04-01' FROM DUAL UNION ALL
    SELECT  4, 111, 3, 1, DATE '2020-04-07' FROM DUAL UNION ALL
    SELECT  5, 111, 1, 2, DATE '2020-06-03' FROM DUAL UNION ALL
    SELECT  6, 111, 2, 1, DATE '2020-06-05' FROM DUAL UNION ALL
    SELECT  7, 111, 1, 2, DATE '2020-09-14' FROM DUAL UNION ALL
    SELECT  8, 111, 2, 1, DATE '2020-09-17' FROM DUAL UNION ALL
    SELECT  9, 222, 1, 2, DATE '2019-12-31' FROM DUAL UNION ALL
    SELECT 10, 222, 2, 2, DATE '2020-12-01' FROM DUAL UNION ALL
    SELECT 11, 222, 2, 2, DATE '2021-01-02' FROM DUAL;
    

    输出:

    PERSONID YEAR TOTAL_DAYS TOTAL_WEEKDAYS
    111 2020 7 7
    222 2019 1 1
    222 2020 366 262
    222 2021 1 1

    db小提琴here

    【讨论】:

      【解决方案3】:

      只要假期不超过一年

      with grps as (
          SELECT sh.*,
            row_number() over (partition by PersonID, NextStatusID order by DateChanged) grp
          FROM ScheduleHistory sh
          WHERE NextStatusID in (1,2) and 3 not in (NextStatusID, PreviousStatusID)
      ), durations as (
          SELECT PersonID, min(DateChanged) DateChanged, max(DateChanged) - min(DateChanged) duration  
          FROM grps
          GROUP BY PersonID, grp
      )
      SELECT PersonID, sum(duration) days_out
      FROM durations
      GROUP BY PersonID;
      

      db<>fiddle

      【讨论】:

      • 不知何故,您的查询最后返回了疯狂的数字,117 而不是 7。我确实必须从 grps 中删除 where 子句,因为下一个或上一个状态实际上可以是任何东西。状态可以从 3 到 2 或 1 到 2。还可以有其他状态
      【解决方案4】:

      year_span 用于在两个不同的记录中拆分跨越两年的时间间隔

      H1 添加一个依赖于 PersonID 的行号以获得每个人的正确序列

      H2 获取每个状态变化的周期并提取间隔结束年份的第一天

      H3 拆分跨越两年的记录并计算每个间隔的正确 date_start 和 date_end

      H 计算每年每个时间间隔内经过的天数

      最终查询总结记录得到输出

      编辑

      如果您需要工作日而不是总天数,则不应使用 total_days/7*5,因为它是一个错误的近似值,并且在某些情况下会产生奇怪的结果。

      我已经发布了周五到周一跳转的解决方案here

      with 
      statuses (sid, sdescr) as (
          select 1, 'Active' from dual union all
          select 2, 'Out of the Office' from dual union all
          select 3, 'Other' from dual 
      ),
      ScheduleHistory(RecordID, PersonID, PreviousStatusID,  NextStatusID , DateChanged) as (
         select 1, 111, 1, 2, date '2020-03-11' from dual union all
         select 2, 111, 2, 1, date '2020-03-13' from dual union all
         select 3, 111, 1, 3, date '2020-04-01' from dual union all
         select 4, 111, 3, 1, date '2020-04-07' from dual union all
         select 5, 111, 1, 2, date '2020-06-03' from dual union all
         select 6, 111, 2, 1, date '2020-06-05' from dual union all
         select 7, 111, 1, 2, date '2020-09-14' from dual union all
         select 8, 111, 2, 1, date '2020-09-17' from dual union all
          SELECT  9, 222, 1, 2, date '2019-12-31' from dual UNION ALL
          SELECT 10, 222, 2, 2, date '2020-12-01' from dual UNION ALL
          SELECT 11, 222, 2, 2, date '2021-01-02' from dual 
      ), 
      year_span (n) as (
         select 1 from dual union all
         select 2 from dual 
      ),
      H1 AS (
          SELECT ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY RecordID) PID, H.*
          FROM ScheduleHistory H
      ),
      H2 as (
          SELECT 
              H1.*, H2.DateChanged DateChanged2, 
              EXTRACT(YEAR FROM H2.DateChanged) - EXTRACT(YEAR FROM H1.DateChanged) + 1 Y,
              trunc(H2.DateChanged,'YEAR') Y2
          FROM H1 H1
          LEFT JOIN H1 H2 ON H1.PID = H2.PID-1 AND H1.PersonID = H2.PersonID
      ),
      H3 AS (
          SELECT Y, N, H2.PID, H2.RecordID, H2.PersonID, H2.NextStatusID,
          CASE WHEN Y=1 THEN H2.DateChanged ELSE CASE WHEN N=1 THEN H2.DateChanged ELSE Y2 END END D1,
          CASE WHEN Y=1 THEN H2.DateChanged2 ELSE CASE WHEN N=1 THEN Y2 ELSE H2.DateChanged2 END END D2 
          FROM H2
          JOIN year_span N ON N.N <=Y
      ),
      H AS (
          SELECT PersonID, NextStatusID, EXTRACT(year FROM d1) Y, d2-d1 D
          FROM H3
      )
      select PersonID, sdescr Status, Y, sum(d) d
      from H
      join statuses s on NextStatusID = s.sid
      group by PersonID, sdescr, Y
      order by PersonID, sdescr, Y
      

      输出

      PersonID    Status              Y       d
      111         Active              2020    177
      111         Other               2020    6
      111         Out of the Office   2020    7
      222         Out of the Office   2019    1
      222         Out of the Office   2020    366
      222         Out of the Office   2021    1
      

      检查小提琴here

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-11-20
        • 2013-04-02
        • 2020-07-10
        • 2020-11-07
        • 2013-01-17
        • 2018-10-10
        相关资源
        最近更新 更多