【问题标题】:How to find the value with date closest to another date如何找到日期最接近另一个日期的值
【发布时间】:2020-02-04 20:39:35
【问题描述】:

我从事医疗保健工作,需要制作一份报告,显示不同时间点的患者实验室值。时间点如下:

移植前:

1 年 = 365 天 +/- 30 天

3 个月 = 90 天 +/- 14 天

1 个月 = 30 天 +/- 7 天

移植后:

1 天 = 24 小时 +/- 12 小时

1 周 = 7 天 +/- 1 天

1 个月 = 30 天 +/- 7 天

3 个月 = 90 天 +/- 14 天

6 个月 = 180 天 +/- 30 天

1 年 = 365 天 +/- 30 天

我的数据模型有很多表(来自 SQL Server 查询的结果),但主实验室表如下所示:

+-----------------------+-----------------+------------+-----------+
| Order ID | Episode ID | Transplant Date | Lab Date   | Lab Value |
+----------+------------+-----------------+------------+-----------+
| 111      | 222        | 5/2/2018        | 1/22/2018  | 23        |
| 112      | 222        | 5/2/2018        | 1/27/2018  | 15        |
| 113      | 222        | 5/2/2018        | 5/3/2018   | 14        |
| 114      | 222        | 5/2/2018        | 10/19/2018 | 12        |
| 115      | 223        | 1/23/2019       | 1/24/2019  | 20        |
| 116      | 223        | 1/23/2019       | 1/25/2019  | 25        |
| 117      | 223        | 1/23/2019       | 1/31/2019  | 29        |
| 118      | 223        | 1/23/2019       | 4/23/2019  | 30        |
| 119      | 223        | 1/23/2019       | 3/1/2019   | 35        |
| 120      | 224        | 7/19/2019       | 7/19/2018  | 5         |
| 121      | 224        | 7/19/2019       | 7/24/2018  | 13        |
+-----------------------+-----------------+------------+-----------+

Order ID 是实验室的唯一标识符,Episode ID 是患者的唯一标识符,我们正在寻找与 Transplant Date 相关的实验室。

还有一个患者数据表,如下所示:

+------------+----------------+-----------------+
| Episode ID | Patient Name   | Transplant Date |
+------------+----------------+-----------------+
| 222        | Alphers, Ralph | 5/2/2018        |
| 223        | Bethe, Hans    | 1/23/2019       |
| 224        | Gammow, George | 7/19/2019       |
+------------+----------------+-----------------+

生成的数据应如下所示:

+------------+------------+--------------+-------------+------------+-------------+--------------+---------------+-------------+
| Episode ID | 1 year pre | 3 months pre | 1 month pre | 1 day post | 1 week post | 1 month post | 6 months post | 1 year post |
+------------+------------+--------------+-------------+------------+-------------+--------------+---------------+-------------+
| 222        |            | 15           |             | 14         |             |              | 12            |             |
| 223        |            |              |             | 20         | 29          | 35           |               |             |
| 224        | 5          |              |             |            |             |              |               |             |
+------------+------------+--------------+-------------+------------+-------------+--------------+---------------+-------------+

考虑到处理时间(用户体验)和开发复杂性,有没有最好的方法?

现在,我就是这样做的。

首先,我使用 Power Query (M) 创建时间点(例如 Table.AddColumn(#"Changed Type", "Minutes to One Year Before Transplant", each Number.Abs(Duration.TotalMinutes(([Lab Date] - DateTime.From(Date.AddYears([Transplant Date], -1)))))))。 然后,我使用 DAX 查找最接近正确目标日期的记录的天数:

Labs shortest minutes to one year before transplant = 
VAR EpisodeID = Patients[Episode ID]
VAR TargetDate = DATEADD(Patients[Transplant Date], 1, MONTH)
VAR WindowDays = 30
RETURN
CALCULATE(
    MIN(Labs[Minutes to One Month After Transplant]),
    FILTER(Labs, Labs[Episode ID] = EpisodeID),
    FILTER(Labs, Labs[Lab Date] >= DATEADD(TargetDate, -WindowDays, DAY)),
    FILTER(Labs, Labs[Lab Date] <= DATEADD(TargetDate, WindowDays, DAY))
)

然后,我使用该分钟数作为标识符来获取Order ID

Lab Order ID closest to one year before transplant = 
VAR EpisodeID = Patients[Episode ID]
VAR TargetDate = DATEADD(Patients[Transplant Date], 1, MONTH)
VAR WindowDays = 30
VAR DaysFrom = Patients[Labs shortest minutes to one year before transplant]
RETURN
CALCULATE(
    MIN(Labs[Order ID]),
    FILTER(Labs, Labs[Episode ID] = EpisodeID),
    FILTER(Labs, Labs[Lab Date] >= DATEADD(TargetDate, -WindowDays, DAY)),
    FILTER(Labs, Labs[Lab Date] <= DATEADD(TargetDate, WindowDays, DAY))
)

最后,我可以使用 Order ID 从该实验室获取我想要的任何内容,例如值:

Lab Value closest to one year before transplant = 
VAR EpisodeID = Patients[Episode ID]
VAR OrderID = Patients[Lab Order ID closest to one year before transplant]
RETURN
CALCULATE(
    MIN(Labs[Value]),
    FILTER(Labs, Labs[Episode ID] = EpisodeID),
    FILTER(Labs, Labs[Order ID] = OrderID)
)

而且,我需要为 3 个不同的实验室执行此操作,这意味着将这个过程重复 30 次。而且,生成的报告需要一段时间来进行计算。我可以将一堆工作推回 SQL Server,但也许这不是最好的主意?

【问题讨论】:

  • 当有多个匹配结果时,你想要哪个?
  • @GordonLinoff Lab DateValue 的组合对于每个 Episode ID 应该是唯一的,但如果不是,则只取第一个。

标签: sql powerbi dax powerquery m


【解决方案1】:

由于对先前有关数据的答案的响应,我添加了不同的答案。

我会用日期所在的桶制作一个表格。这样,如果有人请求不同的桶,添加起来很简单。

CREATE TABLE [dbo].[table_Buckets](
    [Bucket] [varchar](50) NULL,
    [NumDaysLow] [int] NULL,
    [NumDaysHigh] [int] NULL
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Pre-1Yr', -395, -335)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Pre-3Mth', -105, -75)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Pre-1Mth', -37, -21)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Post-1Day', 0, 2)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Post-1Wk', 6, 8)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Post-1Mth', 21, 37)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Post-3Mth', 76, 104)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Post-6Mth', 150, 210)
GO
INSERT [dbo].[table_Buckets] ([Bucket], [NumDaysLow], [NumDaysHigh]) VALUES (N'Post-1Yr', 335, 395)
GO

现在您可以运行以下 sql 查询,该查询将获取数据,将存储桶日期放入剧集,获取每个存储桶的最低数字,然后将表旋转到您想要的视图。您必须围绕此结构设计数据。

select
   EpisodeID
  ,[Pre-1Yr]
  ,[Pre-3Mth]
  ,[Pre-1Mth]
  ,[Post-1Day]
  ,[Post-1Wk]
  ,[Post-1Mth]
  ,[Post-3Mth]
  ,[Post-6Mth]
  ,[Post-1Yr]

from 
(
  --this select statement takes the lowest value if there are more than one value per bucket
  select main.EpisodeID, main.Bucket, min(main.LabValue) as LabValue from
    (--this select statement assigns the episode to a buckets 
     select
        ml.EpisodeID
        , (select Bucket from
                table_Buckets
            where 
                    NumDaysLow  <= datediff(d,pd.TransplantDate, ml.LabDate)
                and NumDaysHigh >= datediff(d,pd.TransplantDate, ml.LabDate)
            ) AS Bucket
        , ml.LabValue as LabValue

    from 
        table_MainLab ML, 
        table_PatientData PD where ml.EpisodeID = pd.EpisodeID
    ) main
group by EpisodeID, Bucket) s


pivot
(avg(LabValue)
for [Bucket] in
  ([Pre-1Yr]
  ,[Pre-3Mth]
  ,[Pre-1Mth]
  ,[Post-1Day]
  ,[Post-1Wk]
  ,[Post-1Mth]
  ,[Post-3Mth]
  ,[Post-6Mth]
  ,[Post-1Yr])
 ) as pivottable

我是发帖新手,我还没有弄清楚如何将输出放在这篇文章中:(...我会练习

【讨论】:

  • L. - 我将此标记为答案,因为它确实解决了眼前的问题。我有点希望有一个神奇的 DAX 或 M 函数,但这可能是最好的方法。我的梦想是某种仪表板,允许用户输入日期点和日期窗口,显示的实验室会神奇地更新到窗口内最接近该时间点的实验室。这样,当下一个客户想要一个不同于那些已经硬编码的时间点时,我就不必进入并修改 SQL。但是,现在问这个可能太多了。谢谢吉姆!
  • 您可以保留相同的 SQL 并为每个客户端创建一个临时表,该表将从临时表而不是 table_Buckets 中进行选择。这是 table_Buckets 的初衷,如果范围发生变化,您只需修改该表中的数据即可。
【解决方案2】:

我能想到的最简单的方法是为每个时间段创建计算列,然后直接将它们用于您想要的任何度量。例如,前 1 年:

1 Year Pre = IF('Table'[Lab Date]>='Table'[Transplant Date]-395 && 'Table'[Lab Date]<='Table'[Transplant Date]-335,'Table'[LabValue],BLANK())

前三个月:

3 Months Pre = IF('Table'[Lab Date]>='Table'[Transplant Date]-104 && 'Table'[Lab Date]<='Table'[Transplant Date]-76,'Table'[LabValue],BLANK())

同样,您也可以为其他时间段创建计算列,并使用它们来获得所需的视觉效果。希望这会有所帮助。

【讨论】:

  • 这将标记时间点日期窗口内的每个实验室行,但不会标记哪个实验室最接近确切的时间点,这是我需要的。此外,我可以通过在 Power Query (M) 中创建这些列或将其硬编码到 SQL 中来做一些前期工作。
【解决方案3】:

你所有的代码都是 M,所以我不确定你为什么用 SQL 标记它。但这里是 [可能不是最优雅的] SQL 解决方案:

create table labs (
    OrderID int not null,
    EpisodeID int not null,
    TransplantDate date not null,
    LabDate date not null,
    LabValue int not null)

insert labs
values 
(111, 222, cast('5/2/2018'  as date), cast('1/22/2018'  as date), 23),
(112, 222, cast('5/2/2018'  as date), cast('1/27/2018'  as date), 15),
(113, 222, cast('5/2/2018'  as date), cast('5/3/2018'   as date), 14),
(114, 222, cast('5/2/2018'  as date), cast('10/19/2018' as date), 12),
(115, 223, cast('1/23/2019' as date), cast('1/24/2019'  as date), 20),
(116, 223, cast('1/23/2019' as date), cast('1/25/2019'  as date), 25),
(117, 223, cast('1/23/2019' as date), cast('1/31/2019'  as date), 29),
(118, 223, cast('1/23/2019' as date), cast('4/23/2019'  as date), 30),
(119, 223, cast('1/23/2019' as date), cast('3/1/2019'   as date), 35),
(120, 224, cast('7/19/2019' as date), cast('7/19/2018'  as date),  5),
(121, 224, cast('7/19/2019' as date), cast('7/24/2018'  as date), 13)

create table patient (
    EpisodeID int not null,
    PatientName varchar(128) not null,
    TransplantDate date not null
)

insert patient
values
(222, 'Alphers, Ralph', cast('5/2/2018'  as date)),
(223, 'Bethe, Hans',    cast('1/23/2019' as date)),
(224, 'Gammow, George', cast('7/19/2019' as date))


select q.EpisodeID
, min(q.[1YrPre]  ) as '1YrPre'
, min(q.[3MoPre]  ) as '3MoPre'
, min(q.[1MoPre]  ) as '1MoPre'
, min(q.[1DayPost]) as '1DayPost'
, min(q.[1WkPost] ) as '1WkPost'
, min(q.[1MoPost] ) as '1MoPost'
, min(q.[3MoPost] ) as '3MoPost'
, min(q.[6MoPost] ) as '6MoPost'
, min(q.[1YrPost] ) as '1YrPost'

from (
    select r.OrderID
    , r.EpisodeID
    , case when r.[1YrPreCheck]   = m.[1YrPreCheck]   and m.[1YrPreCheck]   <= 30 then r.LabValue end as '1YrPre'
    , case when r.[3MoPreCheck]   = m.[3MoPreCheck]   and m.[3MoPreCheck]   <= 14 then r.LabValue end as '3MoPre'
    , case when r.[1MoPreCheck]   = m.[1MoPreCheck]   and m.[1MoPreCheck]   <=  7 then r.LabValue end as '1MoPre'
    , case when r.[1DayPostCheck] = m.[1DayPostCheck] and m.[1DayPostCheck] <=  1 then r.LabValue end as '1DayPost'
    , case when r.[1WkPostCheck]  = m.[1WkPostCheck]  and m.[1WkPostCheck]  <=  1 then r.LabValue end as '1WkPost'
    , case when r.[1MoPostCheck]  = m.[1MoPostCheck]  and m.[1MoPostCheck]  <=  7 then r.LabValue end as '1MoPost'
    , case when r.[6MoPostCheck]  = m.[3MoPostCheck]  and m.[3MoPostCheck]  <= 14 then r.LabValue end as '3MoPost'
    , case when r.[6MoPostCheck]  = m.[6MoPostCheck]  and m.[6MoPostCheck]  <= 30 then r.LabValue end as '6MoPost'
    , case when r.[1YrPostCheck]  = m.[1YrPostCheck]  and m.[1YrPostCheck]  <= 30 then r.LabValue end as '1YrPost'

    from (
        select p.EpisodeID
        , min(abs(datediff(day, l.LabDate, dateadd(year,  -1, p.TransplantDate)))) as '1YrPreCheck'
        , min(abs(datediff(day, l.LabDate, dateadd(month, -3, p.TransplantDate)))) as '3MoPreCheck'
        , min(abs(datediff(day, l.LabDate, dateadd(month, -1, p.TransplantDate)))) as '1MoPreCheck'
        , min(abs(datediff(day, l.LabDate, dateadd(day,    1, p.TransplantDate)))) as '1DayPostCheck'
        , min(abs(datediff(day, l.LabDate, dateadd(day,    7, p.TransplantDate)))) as '1WkPostCheck'
        , min(abs(datediff(day, l.LabDate, dateadd(month,  1, p.TransplantDate)))) as '1MoPostCheck'
        , min(abs(datediff(day, l.LabDate, dateadd(month,  3, p.TransplantDate)))) as '3MoPostCheck'
        , min(abs(datediff(day, l.LabDate, dateadd(month,  6, p.TransplantDate)))) as '6MoPostCheck'
        , min(abs(datediff(day, l.LabDate, dateadd(year,   1, p.TransplantDate)))) as '1YrPostCheck'

        from labs l
          inner join patient p on p.EpisodeID = l.EpisodeID

        group by p.EpisodeID
    ) m
      inner join (
        select l.OrderID
        , p.EpisodeID
        , l.LabValue
        , abs(datediff(day, l.LabDate, dateadd(year,  -1, p.TransplantDate))) as '1YrPreCheck'
        , abs(datediff(day, l.LabDate, dateadd(month, -3, p.TransplantDate))) as '3MoPreCheck'
        , abs(datediff(day, l.LabDate, dateadd(month, -1, p.TransplantDate))) as '1MoPreCheck'
        , abs(datediff(day, l.LabDate, dateadd(day,    1, p.TransplantDate))) as '1DayPostCheck'
        , abs(datediff(day, l.LabDate, dateadd(day,    7, p.TransplantDate))) as '1WkPostCheck'
        , abs(datediff(day, l.LabDate, dateadd(month,  1, p.TransplantDate))) as '1MoPostCheck'
        , abs(datediff(day, l.LabDate, dateadd(month,  3, p.TransplantDate))) as '3MoPostCheck'
        , abs(datediff(day, l.LabDate, dateadd(month,  6, p.TransplantDate))) as '6MoPostCheck'
        , abs(datediff(day, l.LabDate, dateadd(year,   1, p.TransplantDate))) as '1YrPostCheck'

        from labs l
      inner join patient p on p.EpisodeID = l.EpisodeID
    ) r on r.EpisodeID = m.EpisodeID
)q 

group by q.EpisodeID

【讨论】:

    【解决方案4】:

    我会把它放在 cmets 下,但我需要更多的声望点才能发表评论。也许版主可以为我移动这个。

    开始,

    1 - 当实验室不属于上述任何类别时,您需要确定该怎么做。例如,如果实验室日期是 6 个月,您会怎么做。您希望在哪里报告 6 个月的实验室?在上面的示例中,您丢失了 EpisodeID 222 中的一些数据。根据我的经验,您应该在某个地方报告它 - 即使它是一个需要调查的包罗万象的存储桶。

    2 - 当您在同一时间段内拥有 2 份报告时,您需要确定要执行的操作。使用 EpisodeID 222,您将看到在前 90 天期间您有 2 个实验室。 1 月 22 日和 1 月 27 日都将落在该期间。

    3 - 您在两个表中有相似的数据。 TransplantDate 应该只在您的 PatientTable 中

    您最好选择简单的枢轴(交叉表)查询。如果您可以通过回答上面的 1 和 2 来更好地定义您的数据,那么您将更领先于完成这项工作。

    【讨论】:

    • 嗨@Jim L. 1. 是的,数据会漏掉。我正在添加另一个报告选项卡,显示每位患者的所有实验室。不过,请求者想要一份专门显示这些时间点的报告。 2. 他们想要一个最接近时间点的实验室。如果有某种平局,就拿第一个。关键是,每个时间点一个实验室值。 3. 是的,那是为了更容易在 Labs 表中制作某种时间点标志。但是,它仍然非常复杂!感谢您澄清问题。
    • 我发布了新结果作为答案
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-12-06
    • 1970-01-01
    • 2022-10-05
    • 2016-09-17
    • 2019-08-16
    • 2018-06-04
    相关资源
    最近更新 更多