用于计算 SQL 服务器中同一表的不同行的 SQL 查询答案

【问题标题】：SQL query to compute calculations from different rows of the same table in SQL server用于计算 SQL 服务器中同一表的不同行的 SQL 查询
【发布时间】：2017-12-08 11:58:14
【问题描述】：

我想要以下的 SQL 查询。我是 SQL 的新手。下表只是我拥有的数据类型的一个示例。我有大约 3000 万行的非常大的数据，想编写一个查询来获取下面的输出表。

   Id        type        data          time
-----------------------------------------------------------
    1          30          3.9          15:50:10.660555
    1          30          4.0          15:50:10.660777
    1          70          11.5         15:50:10.797966
    1          30          4.1          15:50:10.834444
    1          70          12.6         15:50:10.853114
    1          70          16.7         15:50:10.955086
    1          30          5            15:50:10.99
    11         30          3.8          15:50:11.660555
    11         30          4.1          15:50:11.660777
    11         70          12.5         15:50:11.797966
    11         30          4.7          15:50:11.834444
    11         70          12.68        15:50:11.853114
    11         70          16.76        15:50:11.955086
    11         30          5.1          15:50:11.99

我有一张像上面这样的桌子。对于每种类型 70，我需要使用最后一个已知类型 30 进行计算。例如，对于 Id = 1，对于 15:50:10.797966 处的第一个类型 = 70 数据，我需要在 15 处获取 type = 30 数据： 50:10.660777 这样我就可以计算结果 = 11.5/4.0。同样，对于 type = 70 at 15:50:10.853114，我想要 type = 30 at 15:50:10.834444 的数据，所以我的结果 = 12.6/4.1。

我希望输出如下所示：

Id          type           result             time
------------------------------------------------------
1            70             11.5/4.0        15:50:10.797966
1            70             12.6/4.1        15:50:10.853114
1            70             16.7/4.1        15:50:10.955086
11           70             12.5/4.1        15:50:11.797966
11           70             12.68/4.7       15:50:11.853114
11           70             16.76/4.7       15:50:11.955086

我希望能够使用 pyodbc 在 python 中执行这些 SQL 查询。

任何帮助将不胜感激！提前致谢！！

【问题讨论】：

你已经标记了mysql和postgresql，你实际使用的是哪一个？它们具有不同的功能，因此答案可能会有所不同，具体取决于使用的数据库。另请指定您使用的数据库版本。
@harmic：对不起，实际上是 SQL Server 2017。
time 中没有date 组件吗？
@vkp：我确实有日期组件，但它不是必需的，因为每个文件 id 都有相同的日期。

标签： python sql sql-server pandas

【解决方案1】：

有一种方法可以只使用窗口函数。

对于每一行，获取之前的类型和值。此外，枚举 70 年代，以便您可以将它们识别为一个组（您可以通过累积总和来做到这一点）。

在下一步中，使用分区最大值来获取类型并最终进行计算。

select t.*,
       data / data_30 as result
from (select t.*,
             max(case when prev_type = 30 then prev_data end) over (partition by id, grp) as data_30
      from (select t.*,
                   sum(case when type <> 70 then 1 else 0 end) over (partition by id order by time) as grp,
                   lag(type) over (partition by id order by time) as prev_type,
                   lag(data) over (partition by id order by time) as prev_data
            from t
            where type in (30, 70)
           ) t
     ) t;

其中一个有趣的方面。通过将类型限制为仅 30 和 70，我们保证每组 70 都直接以 30 开头。

【讨论】：

@姜饼。 . .这应该比使用cross apply 的答案更有效。

【解决方案2】：

假设每个 ID 在 type=70 之前至少有一个 type=30 行，您可以使用outer apply 执行此操作，在每个 type=70 行之前获取 type=30 的 max 时间，并使用该值进行除法.

SELECT x.id,
       x.type,
       x.time,
       x.data*1.0/t.data as result
FROM
  (SELECT t.*,t1.maxtime_before
   FROM t 
   OUTER APPLY
     (SELECT max(time) AS maxtime_before
      FROM t t1
      WHERE t1.id=t.id AND t1.type=30 AND t1.time<t.time) t1
   WHERE type = 70
  ) x
JOIN t ON t.id=x.id AND t.time=x.maxtime_before

如果在 type=70 行之前没有 type=30 的行，您可以使用在结果列中显示当时的 null 值

WITH x AS
  (SELECT t.*,
          t1.maxtime_before
   FROM t
   OUTER APPLY
     (SELECT max(time) AS maxtime_before
      FROM t t1
      WHERE t1.id=t.id AND t1.type=30 AND t1.time<t.time) t1
   WHERE type = 70
  )
SELECT x.id,
       x.type,
       x.time,
       x.data*1.0/t.data as resullt
FROM t
JOIN x ON t.id=x.id AND t.time=x.maxtime_before
UNION ALL
SELECT id,
       type,
       time,
       NULL
FROM x
WHERE maxtime_before IS NULL

Sample Demo

另一种方法是使用 max 窗口函数来跟踪 type=30 row per id 的最大运行时间。

WITH x AS
  (SELECT t.*,
          MAX(CASE WHEN type=30 THEN time END) OVER(PARTITION BY id ORDER BY time) AS running_max
   FROM t
  )
SELECT x.id,
       x.type,
       x.time,
       x.data*1.0/t.data as result
FROM x
JOIN t ON t.id=x.id AND t.time=x.running_max
WHERE x.type=70
UNION ALL
SELECT id,
       type,
       time,
       NULL
FROM x 
WHERE running_max IS NULL

【讨论】：