【问题标题】:Finidng ranges of values (5 points) across ranges of time (1 hr)跨时间范围(1 小时)查找值范围(5 分)
【发布时间】:2016-11-22 18:51:39
【问题描述】:

我正在监控物联网应用中温度控制器的有效性。我试图在时间序列中找到“有趣的点”。这些类似于局部最小值或最大值,但包括趋势开始时曲线上的点。这不仅仅是最小值和最大值。它是一个小时内发生 5 个点的范围。

来源

| inMins | unixTime         | temp |  
|--------|------------------|------|   
| 0      | 1479042000000000 | 10.0 |  
| 5      | 1479042300000000 | 11.0 |  
| 10     | 1479042600000000 | 12.0 |  
| 15     | 1479042600000000 | 13.0 |  
| 20     | 1479043200000000 | 14.0 |  
| 25     | 1479043500000000 | 15.0 |  
| 30     | 1479043800000000 | 14.0 |  
| 35     | 1479044100000000 | 13.0 |  
| 40     | 1479044400000000 | 12.0 |  
| 45     | 1479044700000000 | 11.0 |  
| 50     | 1479045000000000 | 10.0 |  
| 55     | 1479045300000000 | 9.0  |  
| 60     | 1479045600000000 | 8.0  |  
| 65     | 1479045900000000 | 9.0  |  
| 70     | 1479046200000000 | 10.0 |  
| 75     | 1479046500000000 | 11.0 |  
| 80     | 1479046800000000 | 12.0 |  
| 85     | 1479047100000000 | 13.0 |  
| 90     | 1479047400000000 | 14.0 |  

想要的形状

| inMins | unixTime         | temp | coldOrHot |  
|--------|------------------|------|-----------|  
| 0      | 1479042000000000 | 10.0 | 1         |  
| 25     | 1479043500000000 | 15.0 | 2         |  
| 30     | 1479043800000000 | 14.0 | 2         |
| 35     | 1479044100000000 | 13.0 | 2         |  
| 60     | 1479045600000000 | 8.0  | 1         |  
| 65     | 1479045900000000 | 9.0  | 1         |  

我目前的结果有些问题

| inMins | unixTime         | temp | coldOrHot |  
|--------|------------------|------|-----------|  
| 25     | 1479043500000000 | 15.0 | 2         |  
| 30     | 1479043800000000 | 14.0 | 2         |  
| 60     | 1479045600000000 | 8.0  | 1         |  
| 65     | 1479045900000000 | 9.0  | 1         |  
| 70     | 1479046200000000 | 10.0 | 1         |  
| 75     | 1479046500000000 | 11.0 | 1         |  
| 80     | 1479046800000000 | 12.0 | 1         |  
| 85     | 1479047100000000 | 13.0 | 1         |  
| 90     | 1479047400000000 | 14.0 | 1         |  

SQL

Select 
  inMins,
  unixTime,
  temp,
  coldOrHot
from 
(Select
  inMins,
  unixTime,
  temp,
  -- 1 means Cold, 2 means Hot, 0 is noise
  if(temp=theLowInWindowDesc,1,
  if(temp=theHighInWindowDesc,2,0)) as coldOrHot,
  theHighInWindowDesc,
  theLowInWindowDesc
FROM
  (SELECT
  inMins,
  unixTime,
  temp,
  theHighInWindowDesc,
  theLowInWindowDesc
  FROM
    (Select
        inMins,
        unixTime,
        temp,
        MAX(temp) OVER(ORDER BY
          unixTime desc RANGE BETWEEN 60 * 60 * 1000000 PRECEDING
          AND CURRENT ROW) AS theHighInWindowDesc,
        MIN(temp) OVER(ORDER BY
         unixTime desc RANGE BETWEEN 60 * 60 * 1000000 PRECEDING
         AND CURRENT ROW) AS theLowInWindowDesc
        FROM
        [esheetzbq:findingLocalExtrema.timeSeriesForKevin]
        ORDER BY
        inMins asc
     )
  )
)
where coldOrHot=1 or coldOrHot=2 

问题

  1. 当温度为 10 并在 25 分钟内增加 5 点时,我没有在第 0 分钟接受“冷”
  2. 我没有在第 35 分钟获得“热”值。
  3. 从 70 分钟到 90 分钟的结果没有考虑到我的 5 分范围标准,并且正在发生,因为我当前的逻辑是基于极端而不是范围。 ""OVER"" 的 SQL 窗口函数在数据集的最后一小时中提取不到一小时的行。这是预期的行为,我不确定哪种逻辑最适合排除给出警告但没有看到 5 分范围的记录。
  4. 这会扩展吗?我将在大约 34M 行的记录集上运行这个逻辑。

【问题讨论】:

  • 我不明白“一小时 5 分”与此有什么关系。看起来您只是在寻找单调的点。
  • 您能解释一下为什么 inMins=30 在您的输出中吗?
  • 你应该澄清你的逻辑 - 否则对于我们任何真正想帮助你应对挑战的人来说,它都是悬而未决的
  • 听起来我的问题可以写得更好。让我看看这是否有助于澄清。 inMins=30 符合条件,因为温度在反转之前又继续下降了 5 点。因此,它不仅仅是时间范围内的最小值和最大值,而是温度移动 5 个点范围内的点。从 25 分钟到 50 分钟,温度下降了 5 点,从 30 分钟到 55 分钟,温度也下降了 5 点。我被要求做的不仅仅是找到局部最小值和最大值,还要找到符合范围标准的趋势点。
  • 所以,根据我目前听到的情况,我预计也会看到 inMins=35、40 等,因为它们也在 5 点趋势之内!但是,它们不在您想要的输出中。因此,这意味着您的脑海中还有一些其他东西在您的逻辑中无法描述

标签: sql google-bigquery


【解决方案1】:

我们开始吧。以下是 BigQuery 标准 SQL
我没有做任何改进/优化查询的尝试 - 而是故意将其“破坏”到完全按照我编写它们的方式进行子查询 - 以确保逻辑易于跟踪并因此被理解
我已经包含以下数据以便于测试,但是如果您想在真实数据上测试它,您可以注释掉数据部分

玩得开心:o)

#standardSQL
WITH `esheetzbq.findingLocalExtrema.timeSeriesForKevin` AS (
  SELECT 0 AS inMins, 1479042000000000 AS unixTime, 10.0 AS temp UNION ALL   SELECT 5 AS inMins, 1479042300000000 AS unixTime, 11.0 AS temp UNION ALL   SELECT 10 AS inMins, 1479042600000000 AS unixTime, 12.0 AS temp UNION ALL   SELECT 15 AS inMins, 1479042900000000 AS unixTime, 13.0 AS temp UNION ALL       SELECT 20 AS inMins, 1479043200000000 AS unixTime, 14.0 AS temp UNION ALL   SELECT 25 AS inMins, 1479043500000000 AS unixTime, 15.0 AS temp UNION ALL   SELECT 30 AS inMins, 1479043800000000 AS unixTime, 14.0 AS temp UNION ALL   SELECT 35 AS inMins, 1479044100000000 AS unixTime, 13.0 AS temp UNION ALL
  SELECT 40 AS inMins, 1479044400000000 AS unixTime, 12.0 AS temp UNION ALL   SELECT 45 AS inMins, 1479044700000000 AS unixTime, 11.0 AS temp UNION ALL   SELECT 50 AS inMins, 1479045000000000 AS unixTime, 10.0 AS temp UNION ALL   SELECT 55 AS inMins, 1479045300000000 AS unixTime, 9.0 AS temp UNION ALL       SELECT 60 AS inMins, 1479045600000000 AS unixTime, 8.0 AS temp UNION ALL   SELECT 65 AS inMins, 1479045900000000 AS unixTime, 9.0 AS temp UNION ALL   SELECT 70 AS inMins, 1479046200000000 AS unixTime, 10.0 AS temp UNION ALL   SELECT 75 AS inMins, 1479046500000000 AS unixTime, 11.0 AS temp UNION ALL       SELECT 80 AS inMins, 1479046800000000 AS unixTime, 12.0 AS temp UNION ALL   SELECT 85 AS inMins, 1479047100000000 AS unixTime, 13.0 AS temp UNION ALL   SELECT 90 AS inMins, 1479047400000000 AS unixTime, 14.0 AS temp UNION ALL        SELECT 95 AS inMins, 1479047700000000 AS unixTime, 15 AS temp UNION ALL SELECT 100 AS inMins, 1479048000000000 AS unixTime, 16 AS temp UNION ALL  SELECT 105 AS inMins, 1479048300000000 AS unixTime, 17 AS temp UNION ALL SELECT 110 AS inMins, 1479048600000000 AS unixTime, 18 AS temp UNION ALL 
  SELECT 115 AS inMins, 1479048900000000 AS unixTime, 19 AS temp UNION ALL SELECT 120 AS inMins, 1479049200000000 AS unixTime, 20 AS temp UNION ALL      SELECT 125 AS inMins, 1479049500000000 AS unixTime, 21 AS temp UNION ALL SELECT 130 AS inMins, 1479049800000000 AS unixTime, 22 AS temp UNION ALL       SELECT 135 AS inMins, 1479050100000000 AS unixTime, 23 AS temp UNION ALL SELECT 140 AS inMins, 1479050400000000 AS unixTime, 24 AS temp UNION ALL      SELECT 145 AS inMins, 1479050700000000 AS unixTime, 25 AS temp UNION ALL SELECT 150 AS inMins, 1479051000000000 AS unixTime, 26 AS temp UNION ALL       SELECT 155 AS inMins, 1479051300000000 AS unixTime, 27 AS temp UNION ALL SELECT 160 AS inMins, 1479051600000000 AS unixTime, 28 AS temp UNION ALL      SELECT 165 AS inMins, 1479051900000000 AS unixTime, 29 AS temp UNION ALL SELECT 170 AS inMins, 1479052200000000 AS unixTime, 30 AS temp UNION ALL       SELECT 175 AS inMins, 1479052500000000 AS unixTime, 31 AS temp UNION ALL SELECT 180 AS inMins, 1479052800000000 AS unixTime, 32 AS temp UNION ALL      SELECT 185 AS inMins, 1479053100000000 AS unixTime, 33 AS temp UNION ALL SELECT 190 AS inMins, 1479053400000000 AS unixTime, 34 AS temp UNION ALL       SELECT 195 AS inMins, 1479053700000000 AS unixTime, 35 AS temp UNION ALL SELECT 200 AS inMins, 1479054000000000 AS unixTime, 36 AS temp UNION ALL      SELECT 205 AS inMins, 1479054300000000 AS unixTime, 37 AS temp UNION ALL SELECT 210 AS inMins, 1479054600000000 AS unixTime, 38 AS temp  
), y AS (
  SELECT inMins, unixTime, temp, delta,
    IFNULL(SUM(new_group_flag) OVER(ORDER BY unixTime ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) AS grp
  FROM (
    SELECT inMins, unixTime, temp,
      LEAD(temp) OVER(ORDER BY unixTime) - temp AS delta,
      CAST(SIGN(LEAD(temp) OVER(ORDER BY unixTime) - temp) != IFNULL(SIGN(temp - LAG(temp) OVER(ORDER BY unixTime)), SIGN(LEAD(temp) OVER(ORDER BY unixTime) - temp)) AS INT64) AS new_group_flag
    FROM `esheetzbq.findingLocalExtrema.timeSeriesForKevin`
  )
), yy AS (
  SELECT inMins, unixTime, temp, delta, grp FROM y UNION ALL
  SELECT inMins, unixTime, temp, delta, grp + 1 AS grp
  FROM (
    SELECT inMins, unixTime, temp, delta, grp, 
      unixTime - MAX(unixTime) OVER(PARTITION BY grp ORDER BY unixTime DESC) AS qq
    FROM y
  ) WHERE qq = 0
), v AS (
  SELECT inMins, unixTime, temp, delta,
     MIN(temp) OVER(PARTITION BY grp ORDER BY unixTime RANGE BETWEEN CURRENT ROW AND 3600000000 FOLLOWING) AS min_temp,
     MAX(temp) OVER(PARTITION BY grp ORDER BY unixTime RANGE BETWEEN CURRENT ROW AND 3600000000 FOLLOWING) AS max_temp
  FROM yy
)
SELECT inMins, unixTime, temp, IF(delta=1, 1, 2) AS coldOrHot
FROM v
WHERE  ABS(max_temp - temp) >= 5 OR ABS(min_temp - temp) >= 5

如果您要走这个方向,请检查Enabling Standard SQLMigrating from legacy SQL 了解更多详情(如果需要)

【讨论】:

  • 太棒了!完美运行。我很感激。谢谢。
  • 一个观察结果……我用 1 的分数进行了试验,当我期待寒冷时,我收到了热测量。例如,在第 5 分钟,如果我将值从 11 更改为 10.1,则解决方案在第 0 分钟而不是 1 时生成冷或热为 2。更改最终 if 语句以评估 Delta >=0 而不是 =1 似乎解决这个问题。我承认,我不太了解 Delta 的逻辑,所以我对自己的调整没有信心。任何想法表示赞赏。多亏了你的帮助,我走得更远了。
  • 那是我的错误。应该使用 delta>=0。 delta 的“作用”是看变化的方向(向上或向下)。在编写查询时,当更改恰好为 1.0 时,我固定在您的特定示例上,这显然不是真实数据的情况。所以从我所见,你的修复是正确的
  • Mikhail,您介意详细说明递增组的用处吗? (grp) 和 (grp +1)?我对 33k 条记录进行了测试,我知道温度在 30 分钟内移动了 6 度以上。我没有得到我预期的信号,所以我试图发现我的逻辑中的缺陷。对于笑容,我从 y 中选择 count(distinct grp) asNumOfGroups 并发现我们有超过 2,000 个组。
  • grp 允许识别具有相同变化方向的点序列。 grp+1 是一种技巧,它从前一组 (grp) 中获取局部极端值(最小值或最大值)并将其作为起点添加到下一组 - 这样局部极端值就会加倍 - 希望这会有所帮助 :o)
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2012-10-19
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-10-22
相关资源
最近更新 更多