用时间序列汇总数据填空答案

【问题标题】：Filling in the blanks with time series summary data用时间序列汇总数据填空
【发布时间】：2013-07-28 09:53:44
【问题描述】：

我正在尝试为“每 n 分钟从传感器接收到的数据”绘制一个简单（读取：快速）迷你图

数据非常简单，它是给定时间戳的一个或多个读数，由传感器的 mac 地址标识：

# SELECT mac, ants, read_at FROM normalized_readings LIMIT 10;
        mac        | ants |         read_at
-------------------+------+-------------------------
 f0:d1:a9:a0:fe:e7 |  -87 | 2013-07-14 09:25:15.215
 74:de:2b:fa:ca:cf |  -69 | 2013-07-14 09:25:14.81
 74:de:2b:fa:ca:cf |  -69 | 2013-07-14 09:25:14.81
 74:de:2b:fa:ca:cf |  -69 | 2013-07-14 09:25:15.247
 38:aa:3c:8f:a0:4f |  -85 | 2013-07-14 09:25:21.672
 38:aa:3c:8f:a0:4f |  -87 | 2013-07-14 09:25:21.695
 60:67:20:c8:bc:80 |  -83 | 2013-07-14 09:25:26.73
 60:67:20:c8:bc:80 |  -81 | 2013-07-14 09:25:26.737
 f0:d1:a9:a0:fe:e7 |  -83 | 2013-07-14 09:25:36.207
 f0:d1:a9:a0:fe:e7 |  -91 | 2013-07-14 09:26:07.77
(10 rows)

我试图想出类似的东西：

# SELECT 
    mac, date_trunc('minute', read_at) AS minute, COUNT(*) 
  FROM 
    normalized_readings
  GROUP BY mac, minute LIMIT 10;
        mac        |       minute        | count
-------------------+---------------------+-------
 00:08:ca:e6:a1:86 | 2013-07-14 16:22:00 |     6
 00:10:20:56:7c:e2 | 2013-07-27 05:29:00 |     1
 00:21:5c:1c:df:7d | 2013-07-14 09:44:00 |     1
 00:21:5c:1c:df:7d | 2013-07-14 09:46:00 |     1
 00:21:5c:1c:df:7d | 2013-07-14 09:48:00 |     1
 00:24:d7:b3:31:04 | 2013-07-15 06:51:00 |     1
 00:24:d7:b3:31:04 | 2013-07-15 06:53:00 |     3
 00:24:d7:b3:31:04 | 2013-07-15 06:59:00 |     3
 00:24:d7:b3:31:04 | 2013-07-15 07:02:00 |     3
 00:24:d7:b3:31:04 | 2013-07-15 07:06:00 |     3
(10 rows)

但请注意所有空白时段，我希望能够提取这些时间段的 0，以表明传感器未记录数据。

可能我只想显示过去 12/24 小时的数据，所以我想我可以通过从 NOW() 过去 12/24 小时内选择人工日期来强制执行此操作，并且对于每个分辨率（可能是 1 或 5 分钟），我必须查询读数表，并对读数的数量求和，但这听起来非常低效。

有没有办法在没有暴力破解的情况下做我想做的事情？据我所见，当我通过选择分钟进行分组时，我会自动从错误的一边来吗？

【问题讨论】：

标签： sql postgresql group-by time-series

【解决方案1】：

对于这种类型的查询，您需要一个生成“macs”和“minutes”的所有组合的驱动程序表。 Postgres 有一个很好的函数generate_series() 来获取每分钟的计数器。

因此，我们的想法是从所有 Mac 开始，每分钟生成一个系列。然后使用驱动表中的left outer join 为每个值获取一行。

with t as (
      SELECT mac, date_trunc('minute', read_at) AS minute, COUNT(*) as cnt
      FROM normalized_readings
      GROUP BY mac, minute
      LIMIT 10
     )
select driver.mac, driver.minute, coalesce(cnt, 0)
from (select mac, minminute,
             minminute + cast(cast(generate_series(0,
                                                   cast(extract(epoch from maxminute - minminute)/60 as int)
                                                  ) as character varying
                                  )||' minute' as interval
                             ) as minute
      from (select mac, min(minute) as minminute, max(minute) as maxminute
            from t
            group by mac
           ) macs
     ) driver left outer join
     t
     on t.mac = driver.mac and
        t.minute = driver.minute

SQL Fiddle 是 here。

我能看到的唯一问题是您如何获取原始数据——t 的定义。我按照问题中的示例进行操作。但是，它实际上没有任何意义。你有一个limit，没有order by。您应该输入适当的order by。

【讨论】：

谢谢 Gordon，它们应该由mac, read_at 订购。我投了一个任意的LIMIT 10，以免用50万个条目杀死SO，但你是对的。 LIMIT 没有 ORDER BY 没有意义，真的。