将 LAST 与 DISTINCT 或 GROUP BY 结合使用？答案

【问题标题】：Combine LAST with DISTINCT or GROUP BY?将 LAST 与 DISTINCT 或 GROUP BY 结合使用？
【发布时间】：2019-09-25 21:15:48
【问题描述】：

我有一个像这样的简单时间序列：

time                    id         area
2019-09-25T17:21:00Z    1          us
2019-09-25T17:22:00Z    1          uk
2019-09-25T17:23:00Z    2          canada
2019-09-25T17:24:00Z    3          us
2019-09-25T17:25:00Z    1          canada

我想计算每个不同 id 的最后一个点的面积，所以结果应该是这样的：

area      count        
us        1
canada    2

由于 id 1 的最后一点是加拿大，我想忽略 id 1 之前的所有点。

如何仅查询每个不同 ID 的最近点？这可能吗？

编辑：这是我正在使用的实际数据。

name: click3
time                area   id       value
----                ----   --       -----
1569480689926885700 travel session1 1
1569480693527591500 travel session2 1
1569480699951799900 vtc    session3 1
1569480706416720700 health session1 1
1569480713265800900 claim  session4 1
1569480719882312600 health session3 1

area 和 id 确实是标签。当我用GROUP BY 做一个简单的select LAST(value) 时，我得到以下信息：

> select last(value) as value, area, id from click3 group by id
name: click3
tags: id=session1
time                value area   id
----                ----- ----   --
1569480706416720700 1     health session1

name: click3
tags: id=session2
time                value area   id
----                ----- ----   --
1569480693527591500 1     travel session2

name: click3
tags: id=session3
time                value area   id
----                ----- ----   --
1569480719882312600 1     health session3

name: click3
tags: id=session4
time                value area  id
----                ----- ----  --
1569480713265800900 1     claim session4

这是正确的 - 每个唯一会话 ID 的最后一点。当我select * from这个查询作为子查询时，结果是

> select * from (select last(value) as value, area, id from click3 group by id)
name: click3
time                area   id       id_1     value
----                ----   --       ----     -----
1569480693527591500 travel session2 session2 1
1569480706416720700 health session1 session1 1
1569480713265800900 claim  session4 session4 1
1569480719882312600 health session3 session3 1

当我添加像 COUNT(*) 或 SUM(value) 这样的聚合时，我看到了预期的数字 4：

> select count(*) from (select last(value) as value, area, id from click3 group by id)
name: click3
time count_value
---- -----------
0    4

但是，如果我在此查询中添加 GROUP BY area，我希望看到 travel 的值为 1，health 的值为 2，并声称其值为 1。出于某种原因，它看起来像使用完整的原始数据点集，而不是从子查询中减少集，所以我最终得到了这个：

> select count(*) from (select last(value) as value, area from click3 group by id) group by area
name: click3
tags: area=claim
time count_value
---- -----------
0    1

name: click3
tags: area=health
time count_value
---- -----------
0    2

name: click3
tags: area=travel
time count_value
---- -----------
0    2

name: click3
tags: area=vtc
time count_value
---- -----------
0    1

我想我一定是严重误解了 influxdb 的工作原理。我错过了什么？

【问题讨论】：

标签： influxdb influxql

【解决方案1】：

假设 id 和 area 是标签，这样的事情应该可以工作

select count(*) from (select last(*) from your_measurement group by id) group by area 您可以将 * 替换为单个字段。嵌套查询获取每个 id 的最后一个数据点，外部查询根据这些结果对每个区域进行计数。根据您的具体用例，查询可能会略有不同。

【讨论】：