【问题标题】:Cumulative summation over null values空值的累积求和
【发布时间】:2014-12-11 18:22:15
【问题描述】:

我试图计算累积总和列以找出每个月的当前工作员工,但我得到的是 NULL 而不是上个月的当前员工。

表员工:

id    date_started     date_terminated
1      01-Apr-14       NULL
2      21-Apr-14       NULL
3      11-Apr-14       NULL
4      01-Apr-14       NULL
5      01-Apr-14       NULL
6      05-Apr-14       NULL
7      01-Apr-14       NULL
8      01-Apr-14       NULL
9      01-Apr-14       NULL
10     29-Apr-14       NULL
11     21-Apr-14       NULL
12     01-Apr-14       NULL
13     01-Apr-14       NULL
14     01-Apr-14       NULL
15     05-Aug-14       NULL
16     01-Oct-1        NULL
17     13-Oct-14       NULL
18     22-Oct-14       NULL
19     25-Oct-14       NULL
10     29-Oct-14       NULL

表格日期:它包含date 列,其中包含从2011-Jan-01 到当前日期的数据。

从我的查询中获得结果表:

+--------------------------------------------------------------+
| date                  | employee_joined | present_employees  |
+--------------------------------------------------------------+
| 2014-01-01 00:00:00-7 |            NULL |              NULL  |
| 2014-02-01 00:00:00-7 |            NULL |              NULL  |
| 2014-03-01 00:00:00-7 |            NULL |              NULL  |
| 2014-04-01 00:00:00-7 |              14 |                14  |
| 2014-05-01 00:00:00-7 |            NULL |              NULL  |
| 2014-06-01 00:00:00-7 |            NULL |              NULL  |
| 2014-07-01 00:00:00-7 |            NULL |              NULL  |
| 2014-08-01 00:00:00-7 |               1 |                15  |
| 2014-09-01 00:00:00-7 |            NULL |              NULL  |
| 2014-10-01 00:00:00-7 |               5 |                20  |
+--------------------------------------------------------------+

我正在寻找结果表:

+--------------------------------------------------------------+
| date                  | employee_joined | present_employees  |
+--------------------------------------------------------------+
| 2014-01-01 00:00:00-7 |            NULL |              NULL  |
| 2014-02-01 00:00:00-7 |            NULL |              NULL  |
| 2014-03-01 00:00:00-7 |            NULL |              NULL  |
| 2014-04-01 00:00:00-7 |              14 |                14  |
| 2014-05-01 00:00:00-7 |            NULL |                14  |
| 2014-06-01 00:00:00-7 |            NULL |                14  |
| 2014-07-01 00:00:00-7 |            NULL |                14  |
| 2014-08-01 00:00:00-7 |               1 |                15  |
| 2014-09-01 00:00:00-7 |            NULL |                15  |
| 2014-10-01 00:00:00-7 |               5 |                20  |
+--------------------------------------------------------------+

我已尝试从以下查询中获取数据:

/*-----ONLY FOR PRESENT EMPLOYEES USING CUMULATIVE SUM--------*/
WITH fdates AS 
    (
        SELECT DATE_TRUNC('month', d.date) AS date
        FROM dates d
        WHERE d.date::DATE <= '10-01-2014' AND
        d.date::DATE >= '01-01-2014'
        group by DATE_TRUNC('month', d.date)
    ),  
employeeJoin AS
    (
        SELECT COALESCE( COUNT(e.id), 0 ) AS employee_joined, 
            DATE_TRUNC( 'month', e.date_started) AS date_started
        FROM employees e GROUP BY DATE_TRUNC( 'month', e.date_started)
    ),
employeeJoinRownum AS
    (   
        SELECT employee_joined, date_started, row_number() OVER (order by date_started) rownum
        FROM employeeJoin
    ) 
SELECT d.*, employee_joined AS employee_joined,
        (SELECT sum(employee_joined) FROM employeeJoinRownum eJ2 WHERE eJ2.rownum <= eJ1.rownum) AS Total_Joined_Employees
    FROM fdates d
    LEFT OUTER JOIN employeeJoinRownum eJ1 ON( eJ1.date_started = DATE_TRUNC('month', d.date) )
    ORDER BY d.date

【问题讨论】:

    标签: php sql postgresql postgis postgresql-9.1


    【解决方案1】:

    以下查询统计每个日期的加入员工和离开员工,然后使用 window function 来累积结果。

    SELECT
      dates.date,
      COUNT(DISTINCT ej.id) AS employee_joined,
      COUNT(DISTINCT el.id) AS employee_left,
      SUM(COUNT(DISTINCT ej.id) - COUNT(DISTINCT el.id)) OVER (ORDER BY dates.date) AS present_employees
    FROM
      dates LEFT JOIN employees ej
    ON
      ej.date_started = dates.date LEFT JOIN employees el
    ON
      el.date_terminated = dates.date
    GROUP BY
      dates.date;
    

    如果您没有预填充的dates 表,您可以改用generate_series 设置返回函数并将其左连接。

    SELECT
      ...
    FROM
      GENERATE_SERIES('2014-01-01', '2014-01-10', '1 day'::interval) dates LEFT JOIN employees ej
    ON
      ...
    

    【讨论】:

    • 我有date 表,所以我不想生成系列。
    • 好的,然后从您的dates 表中选择。只需从查询中删除 GENERATE_SERIES 函数,它就会起作用(假设表的名称是 dates,包含日期的字段是 date)。
    • 我只是相应地编辑了答案,但除非有特殊原因需要依赖预填充的dates 表,否则使用generate_series 是更好、更灵活的解决方案。
    • 如果一天内有多于一个人加入和多于一个人离开,则此方法会失败。 Example here,看2000-04-01,四个人离开,两个人加入。
    • 正确。它在welcome 表中为bye 表中找到的每条记录生成N​​ULL 值,因此COUNT(ej.*) 正在计算找到的所有行(包括bye 表中的NULL 值)。对于 2000-04-01,有 2 人加入和 4 人离开,因此得到 2 * 4 = 8 的错误结果。在主键上创建 COUNT DISTINCT 可以解决问题。我相应地编辑了答案,这是example
    【解决方案2】:

    您可以通过为连接和终止事件创建一行来规范化表:

    select  welcome as date
    ,       1 as size_change
    from    emps
    union all
    select  bye
    ,       -1
    from    emps
    where   bye is not null
    

    现在您可以使用运行总和来计算当前大小:

    ; with  events as
            (
            select  welcome as date
            ,       1 as size_change
            from    emps
            union all
            select  bye
            ,       -1
            from    emps
            where   bye is not null
            )
    select  distinct to_char(date, 'YYYY-MM-DD') as date
    ,       sum(size_change) over (order by date) as family_size
    from    events
    order by
            date
    ;
    

    Example at SQL Fiddle.

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2015-07-20
      • 2017-05-16
      • 2021-01-07
      • 1970-01-01
      • 1970-01-01
      • 2011-04-15
      • 1970-01-01
      相关资源
      最近更新 更多