【问题标题】:Returning latest record for each column返回每列的最新记录
【发布时间】:2017-09-07 16:55:59
【问题描述】:

样本表

+--------+------------+------------------+-------------+-------------+
| FileID |    Date    |     Activity     | Assigned_By | Responsible |
+--------+------------+------------------+-------------+-------------+
|    123 | 2016/01/01 | Work in progress | Foo1        | Bob         | 
|    234 | 2016/01/01 | Work in progress | Foo2        | Smith       | 
|    123 | 2016/01/02 | Escalated        | NULL        | NULL        | 
|    123 | 2016/01/03 | Need reassign    | NULL        | NULL        | 
|    123 | 2016/01/03 | Reassigned       | Foo2        | John        | 
|    234 | 2016/01/03 | Completed        | NULL        | NULL        |
|    123 | 2016/01/04 | Completed        | NULL        | NULL        |
+--------+------------+------------------+-------------+-------------+

我的查询:

SELECT FileID,
       Date,
       Activity,
       Assigned_By,
       Responsible
FROM (
      SELECT fooTable.*, ROW_NUMBER() OVER (PARTITION BY FileID ORDER BY Date DESC) AS Separator
     ) fooTable
INNER JOIN randomTable ON fooTable.FileID = randomTable.ID
WHERE fooTable.Separator = 1;

返回:

+--------+------------+-----------+-------------+-------------+
| FileID |    Date    | Activity  | Assigned By | Responsible |
+--------+------------+-----------+-------------+-------------+
|    234 | 2016/01/03 | Completed | NULL        | NULL        |
|    123 | 2016/01/04 | Completed | NULL        | NULL        |
+--------+------------+-----------+-------------+-------------+

期望的结果 - 返回具有最新 DATE 的每个唯一 FileID 的 LATEST 列记录的每一行:

+--------+------------+-----------+-------------+-------------+
| FileID |    Date    | Activity  | Assigned By | Responsible |
+--------+------------+-----------+-------------+-------------+
|    234 | 2016/01/03 | Completed | Foo2        | John        |
|    123 | 2016/01/04 | Completed | Foo1        | Bob         |
+--------+------------+-----------+-------------+-------------+

我有点理解为什么查询不起作用,因为它只返回最新的行(由 row_number 分配 1),因此我将根据降序日期收到该唯一 FileID 的第一条 ROW 记录。但我不知道如何解决它。

编辑:我意识到的另一件事是 MAX() 不适用于 Assigned_By 和 Responsible(我认为),因为它会返回更大的字母名称......

【问题讨论】:

  • 您需要为每个可为空的列多查询该表一次。您将对 FileID、Date、Activity 执行当前查询(假设 Activity 不可为空),然后离开加入另一个查询分配人和另一个查询负责人。附加查询将添加“WHERE {column} IS NOT NULL”的 where 子句。

标签: sql sql-server


【解决方案1】:

你可以使用条件聚合做你想做的事:

WITH t AS (
      SELECT FileID, Date, Activity, Assigned_By, Responsible
      FROM fooTable INNER JOIN 
           randomTable
           ON fooTable.FileID = randomTable.ID
     )
SELECT FileID, MAX(Date) as date,
       MAX(CASE WHEN seqnum = 1 THEN Activity END) as Activity,
       MAX(CASE WHEN seqnum_nonnull = 1 THEN Assigned_By END) as Assigned_By,
       MAX(CASE WHEN seqnum_nonnull = 1 THEN Responsible END) as Responsible
FROM (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY FileID ORDER BY Date DESC) AS seqnum,
             ROW_NUMBER() OVER (PARTITION BY FileID
                                ORDER BY (CASE WHEN AssignedBy IS NOT NULL THEN 1 ELSE 2 END), Date DESC
                               ) AS seqnum_notnull
      FROM t
     ) t
GROUP BY FileID;

【讨论】:

  • 先生。林诺夫,如果操纵数据是世界饥饿,你会解决它。这非常有效,非常感谢你。
【解决方案2】:

您可以使用联接,也可以使用 FIRST_VALUE,如下所示:

SELECT 
  FileID,
  FIRST_VALUE(Date) OVER (PARTITION BY FileID ORDER BY Date DESC ROWS  BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Date,
  FIRST_VALUE(Activity) OVER (PARTITION BY FileID ORDER BY CASE WHEN Activity IS NULL THEN 0 ELSE 1 END DESC, Date DESC ROWS  BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Activity,
  FIRST_VALUE(Assigned_By) OVER (PARTITION BY FileID ORDER BY CASE WHEN Assigned_By IS NULL THEN 0 ELSE 1 END DESC, Date DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Assigned_By,
  FIRST_VALUE(Responsible) OVER (PARTITION BY FileID ORDER BY CASE WHEN Responsible IS NULL THEN 0 ELSE 1 END DESC, Date DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Responsible
FROM fooTable
INNER JOIN randomTable ON fooTable.FileID = randomTable.ID
WHERE fooTable.Separator = 1;     

【讨论】:

  • 这如何防止NULL 值?它似乎等同于 OP 的代码。
  • @GordonLinoff -- 在 db2 中,默认值是不包括 first_value 的空值 -- 我没有在 sql server 中测试过这个,但我做了这个假设。
  • @GordonLinoff -- 是的,sql server 不允许您忽略像 db2 和 oracle 这样的空值 -- 必须添加另一个 order by 子句才能使其作为我的编辑工作 -- 感谢您找到问题.
  • @Hogan 这是更新后的一个很好的答案,唯一的问题是额外的列是查询有点混乱,所以使用 Gordon 的条件聚合更干净。不过,还是非常感谢您。
  • @Simon -- 不客气。这是一个有用的答案,可以帮助您了解您是否首先需要基于日期的某些列,而首先需要基于不同排序的其他列。根据许多其他因素,它可能比其他解决方案更快。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2020-07-22
  • 1970-01-01
  • 2013-11-26
  • 1970-01-01
  • 2019-07-14
  • 1970-01-01
  • 2016-11-10
相关资源
最近更新 更多