【问题标题】:Aggregating statistics into JSON in Postgresql在 Postgresql 中将统计信息聚合到 JSON 中
【发布时间】:2015-06-16 22:56:26
【问题描述】:

所以我正在尝试将概览统计信息计算为 JSON,但无法将它们整理到查询中。

有2张桌子:

appointments
- time timestamp
- patients int


assignments
- user_id int
- appointment_id int

我想按用户、按小时计算当天的患者人数。理想情况下,它应该是这样的:

[ 
  {hour: "2015-07-01T08:00:00.000Z", assignments: [
    {user_id: 123, patients: 3}, 
    {user_id: 456, patients: 10}, 
    {user_id: 789, patients: 4},
  ]},
  {hour: "2015-07-01T09:00:00.000Z", assignments: [
    {user_id: 456, patients: 1},
    {user_id: 789, patients: 6}
  ]},
  {hour: "2015-07-01T10:00:00.000Z", assignments: []}
  ...
]

我有点接近了:

with assignments_totals as (
    select user_id,sum(patients),date_trunc('hour',appointments.time) as hour
    from assignments
    inner join appointments on appointments.id = assignments.appointment_id
    group by date_trunc('hour',sales.time),user_id
  ), hours as (
    select to_char(date_trunc('hour',time),'YYYY-MM-DD"T"HH24:00:00.000Z') as hour, array_to_json(array_agg(DISTINCT assignment_totals)) as patients
    from appointments 
    left join assignment_totals on date_trunc('hour',sales.time) = assignment_totals.hour
    where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z' 
    group by date_trunc('hour',time)
    order by date_trunc('hour',time) 
  )
  select array_to_json(array_agg(hours)) as hours from hours;

哪些输出:

[ 
  {hour: "2015-07-01T08:00:00.000Z", assignments: [
    {user_id: 123, patients: 3, hour: "2015-07-01T08:00:00.000Z" }, 
    {user_id: 456, patients: 10, hour: "2015-07-01T08:00:00.000Z"}, 
    {user_id: 789, patients: 4, hour: "2015-07-01T08:00:00.000Z"},
  ]},
  {hour: "2015-07-01T09:00:00.000Z", assignments: [
    {user_id: 456, patients: 1, hour: "2015-07-01T09:00:00.000Z"},
    {user_id: 789, patients: 6, hour: "2015-07-01T09:00:00.000Z"}
  ]},
  {hour: "2015-07-01T10:00:00.000Z", assignments: [null]}
  ...
]

虽然这可行,但有 2 个问题可能相互独立,也可能不相互独立:

  1. 如果该小时没有约会,我仍然希望将该小时包含在数组中(如示例中的上午 10 点),但要有一个空的“分配”数组。现在它在那里放了一个空值,我不知道如何在仍然保留时间的同时摆脱它。
  2. 我必须将小时与 user_id 和约会一起包含在分配条目中,因为我需要它将 assignments_totals 查询加入小时查询。但这是不必要的,因为它已经在父级中。
  3. 我觉得应该可以在 1 个 cte 和 1 个查询中完成,现在我使用的是 2 个 cte...但不知道如何压缩它并使其工作。

我想做类似的事情

  hours as (
    select to_char(date_trunc('hour',time),'YYYY-MM-DD"T"HH24:00:00.000Z') as hour, sum(appointments.patients) OVER(partition by assignments.user_id) as appointments
    from appointments 
    left join assignments on appointments.id = assignments.appointment_id
    where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z'  
    group by date_trunc('hour',time)
    order by date_trunc('hour',time) 
  )
  select array_to_json(array_agg(hours)) as hours from hours

但如果不给我一个“属性必须在分组依据或聚合函数错误中”,我就无法让它工作。

有人知道如何解决这些问题吗?提前致谢!

【问题讨论】:

    标签: json postgresql aggregate-functions common-table-expression


    【解决方案1】:

    您上次查询的主要问题似乎在于将window functionsaggregate functions 混为一谈。窗口函数使用OVER 语法,当SELECT 子句中有其他字段时,它们本身不需要GROUP BY。另一方面,当SELECT 子句中有其他(非聚合函数)字段时,聚合函数使用GROUP BY。这种差异的一个实际后果是窗口函数不会自动DISTINCT

    窗口函数导致的NULL 值问题可以通过简单的COALESCE 解决,这样使用零而不是null。

    因此,要使用窗口函数编写查询,请使用以下内容:

    WITH hours AS
    (
        SELECT DISTINCT to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z') AS hour,
               COALESCE(SUM(ap.patients) OVER (PARTITION BY asgn.user_id), 0) AS appointment_count
        FROM   appointments ap
        LEFT JOIN assignments asgn ON ap.id = asgn.appointment_id
        WHERE  ap.time >= '2015-07-01T07:00:00.000Z'
        AND    ap.time < '2015-07-02T07:00:00.000Z'
    )
    SELECT array_to_json(array_agg(hours)) AS hours
    FROM   hours
    ORDER BY hour
    

    使用聚合函数:

    WITH hours AS
    (
        SELECT to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z') AS hour,
               SUM(COALESCE(ap.patients, 0)) AS appointment_count,
               asgn.user_id
        FROM   appointments ap
        LEFT JOIN assignments asgn ON ap.id = asgn.appointment_id
        WHERE  ap.time >= '2015-07-01T07:00:00.000Z'
        AND    ap.time < '2015-07-02T07:00:00.000Z'
        GROUP BY asgn.user_id, to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z')
    )
    SELECT array_to_json(array_agg(hours)) AS hours
    FROM   hours
    ORDER BY hour
    

    我的语法可能不太正确,因此在使用此解决方案或类似解决方案之前请仔细检查(并随时编辑以更正任何错误)。

    【讨论】:

    • 感谢 Andrew,我忘了我可以将它们合并为 0。我仍然希望能够过滤,以便没有与患者
    【解决方案2】:

    我对此感到沮丧主要是因为我没有查看 Postgres 9.4 文档,该文档具有处理 json 的新功能。

    我找到的解决方案建立在原始查询的基础上,但随后使用 json_array_elements 分解分配数组,使用 where 过滤,然后再次构建它。本质上似乎毫无意义:

    json_agg(json_array_elements(json_agg(*)))
    

    但它对性能的影响很小,可以让我到达我需要去的地方。如果您找到更好的解决方案,请随时发表评论!在

      with assignment_totals as (
        select 
          date_trunc('hour',appointments.time) as hour, 
          user_id, 
          coalesce(sum(patients),0) as patients
        from appointments
        left outer join assignments on appointment.id = assignments.appointment_id
        where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z' 
        group by date_trunc('hour',appointments.time),user_id
      ), hours as (
        select 
          to_char(assignment_totals.hour,'YYYY-MM-DD"T"HH24:00:00.000Z') as hour,
          (
            select coalesce(json_agg(json_build_object('user_id',(t->'user_id'),'patients',(t->'patients')) order by (t->>'user_id')),'[]'::json) 
            from json_array_elements(json_agg(assignment_totals)) t 
            where (t->>'patients') != '0'
          ) as patients
        from assignment_totals 
        group by assignment_totals.hour
        order by assignment_totals.hour
      )
      select array_to_json(array_agg(hours)) as hours from hours
    

    感谢 Andrew 指出我可以将空值合并为 0。但我仍然想过滤掉患者 = 0 的条目。这解决了我的所有问题,让我能够使用 where 过滤它们,然后让我能够通过使用 json_build_object 构建新的 json 对象来节省时间。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2023-01-12
      • 1970-01-01
      • 2021-03-01
      • 2010-12-18
      • 1970-01-01
      • 2020-11-13
      • 2022-10-23
      相关资源
      最近更新 更多