【问题标题】:Count specific values in rows计算行中的特定值
【发布时间】:2019-10-03 01:31:21
【问题描述】:

我有下表,我正在尝试计算每个客户在每个评估下的 POC 数量。

clientId    ProcDate        ProcDesc    
7180        2018-06-13      Assessment
7180        2018-06-13      POC 20
7180        2018-06-13      POC 4b
7180        2018-06-20      POC 20
7180        2018-06-20      POC 4b
7180        2018-06-27      POC 20
7180        2018-06-27      POC 4b
7180        2018-07-04      Assessment
7180        2018-07-04      POC 20
7180        2018-07-04      POC 4b
7180        2018-07-11      POC 20
7180        2018-07-18      POC 20
7180        2018-07-18      POC 4b
7180        2018-09-05      Assessment
7180        2018-09-05      POC 20
7180        2018-09-12      POC 20
7180        2018-09-12      POC 4b
7180        2018-09-19      POC 20
7180        2018-09-19      POC 4b

2584        2018-10-03      Assessment
2584        2018-10-03      POC 20
2584        2018-10-03      POC 4b
2584        2018-11-04      Assessment
2584        2018-11-04      POC 20
2584        2018-11-04      POC 4b
2584        2018-11-11      POC 20
2584        2018-11-18      POC 20
2584        2018-11-18      POC 4b
7585        2018-11-04      Assessment
7585        2018-11-04      POC 20
7585        2018-11-04      POC 4b
7585        2018-11-11      POC 20
7585        2018-11-18      POC 20
7585        2018-11-18      POC 4b
6581        2018-11-04      CommAssessment
6581        2018-11-04      POC 20
6581        2018-11-04      POC 4b
6581        2018-11-11      POC 20

我想得到如下结果。

ClientId    AssessDate      Type            CountPOC
7180        2018-06-13      Assessment      6
7180        2018-07-04      Assessment      5
7180        2018-09-05      Assessment      5
2584        2018-10-03      Assessment      2
2584        2018-11-04      Assessment      5
7585        2018-11-04      Assessment      5
6581        2018-11-04      CommAssessment      3

我无法弄清楚如何计算每个评估下方的 POC 数量。

SELECT ClientId, ProcDate, ProcDesc
FROM ProcJoins
WHERE ProcDesc] in ('Assessment','POC 20','POC 4b')  
GROUP BY ClientId, ProcDate, ProcDesc
ORDER BY ProcedureDate

【问题讨论】:

    标签: sql sql-server tsql sql-server-2008-r2


    【解决方案1】:

    这里有 2 个选项可以满足您的需求。

    第一个假设每个客户总是从评估开始。

    WITH CTE AS(
        SELECT *, 
            ROW_NUMBER() OVER( ORDER BY clientId, ProcDate) rn
        FROM ProcJoins
    )
    SELECT clientId, 
           ProcDate,
           ProcDesc,
           LEAD(rn, 1, t.totalProc) OVER(ORDER BY clientId, rn) - rn - 1
    FROM CTE
    CROSS JOIN( SELECT COUNT(*) + 1 AS totalProc FROM CTE) t
    WHERE ProcDesc = 'Assessment'
    ORDER BY clientId DESC, ProcDate;
    

    第二个只是使用范围查询。

    WITH CTE AS(
        SELECT *, 
             LEAD(ProcDate, 1, '99990101') OVER(PARTITION BY clientId ORDER BY ProcDate) EndDate
        FROM ProcJoins
        WHERE ProcDesc = 'Assessment'
    )
    SELECT c.clientId, 
           c.ProcDate,
           c.ProcDesc,
           COUNT(*)
    FROM CTE c
    JOIN ProcJoins p ON p.ProcDate >= c.ProcDate
                    AND p.ProcDate < c.EndDate
                    AND p.clientId = c.clientId
    WHERE p.ProcDesc <> 'Assessment'
    GROUP BY c.clientId, 
           c.ProcDate,
           c.ProcDesc
    ORDER BY clientId DESC, ProcDate;
    

    与 2005、2008 和 2008R2 版本兼容的第三个选项

    WITH CTE AS(
        SELECT *, 
            ROW_NUMBER() OVER( PARTITION BY clientId ORDER BY ProcDate) rn
        FROM ProcJoins
        WHERE ProcDesc NOT LIKE 'POC%'
    )
    SELECT c.clientId, 
           c.ProcDate,
           c.ProcDesc,
           COUNT(*)
    FROM CTE       c
    LEFT JOIN CTE  n  ON c.ClientId = n.clientId 
                     AND c.rn = n.rn-1
    JOIN ProcJoins p  ON c.ClientId = p.clientId 
                     AND c.ProcDate <= p.ProcDate 
                     AND ISNULL(n.ProcDate, '99991231') > p.ProcDate
    WHERE p.ProcDesc LIKE 'POC%'
    GROUP BY c.clientId, 
           c.ProcDate,
           c.ProcDesc
    ORDER BY c.ProcDate;
    

    【讨论】:

    • 您好,我收到此错误:未启用并行数据仓库 (PDW) 功能。我认为LEAD函数不兼容,我使用的是SQL SERVER 2008 R2。
    • 您应该让人们知道您使用的版本已接近终止支持。这些解决方案仅适用于 2012+ 版本。
    • 第三个选项在逻辑上与第二个非常相似,但使用了以前版本中可用的工具。
    • Luis Cazares,感谢您的帮助,我想计算每个评估组中第一个 POC 和最后一个 POC 之间的持续时间(以周为单位),并显示在每行的列中。这可能吗?谢谢
    • 你已经在计算它们了,所以你只需要得到最大和最小日期的时间差。根据要求,您可能希望使用周数或天数除以 7 的 DATEDIFF。
    【解决方案2】:

    另一种可能的方法是定义组并获取计数:

    输入:

    CREATE TABLE #Data (
        ClientId int, 
        ProcDate date,
        ProcDesc varchar(10)
    )
    INSERT INTO #Data 
        (ClientId, ProcDate, ProcDesc)
    VALUES
        (7180, '20180613',  'Assessment'),
        (7180, '20180613',  'POC 20'),
        (7180, '20180613',  'POC 4b'),
        (7180, '20180620',  'POC 20'),
        (7180, '20180620',  'POC 4b'),
        (7180, '20180627',  'POC 20'),
        (7180, '20180627',  'POC 4b'),
        (7180, '20180704',  'Assessment'),
        (7180, '20180704',  'POC 20'),
        (7180, '20180704',  'POC 4b'),
        (7180, '20180711',  'POC 20'),
        (7180, '20180718',  'POC 20'),
        (7180, '20180718',  'POC 4b'),
        (7180, '20180905',  'Assessment'),
        (7180, '20180905',  'POC 20'),
        (7180, '20180912',  'POC 20'),
        (7180, '20180912',  'POC 4b'),
        (7180, '20180919',  'POC 20'),
        (7180, '20180919',  'POC 4b'),
        (2584, '20181003',  'Assessment'),
        (2584, '20181003',  'POC 20'),
        (2584, '20181003',  'POC 4b'),
        (2584, '20181104',  'Assessment'),
        (2584, '20181104',  'POC 20'),
        (2584, '20181104',  'POC 4b'),
        (2584, '20181111',  'POC 20'),
        (2584, '20181118',  'POC 20'),
        (2584, '20181118',  'POC 4b'),
        (7585, '20181104',  'Assessment'),
        (7585, '20181104',  'POC 20'),
        (7585, '20181104',  'POC 4b'),
        (7585, '20181111',  'POC 20'),
        (7585, '20181118',  'POC 20'),
        (7585, '20181118',  'POC 4b'),
        (6581, '20181104',  'CommAssessment'),
        (6581, '20181104',  'POC 20'),
        (6581, '20181104',  'POC 4b'),
    (6581, '20181111',  'POC 20')
    

    声明(如果支持SUM (...) OVER (ORDER BY ...)):

    ;WITH GroupsCTE AS (
        SELECT 
            ClientId, ProcDate, ProcDesc,
            SUM(CASE WHEN ProcDesc IN ('Assessment', 'CommAssessment') THEN 1 ELSE 0 END) 
            OVER (ORDER BY ClientId, ProcDate, CASE WHEN ProcDesc IN ('Assessment', 'CommAssessment') THEN 0 ELSE 1 END) AS Groups
        FROM #Data
    ), CountCTE AS (
        SELECT
           ClientId, ProcDate, ProcDesc,
           COUNT(*) OVER (PARTITION BY Groups) AS [Count]
        FROM GroupsCTE      
    )
    SELECT 
        ClientId, ProcDate, ProcDesc, [Count] - 1
    FROM CountCTE
    WHERE ProcDesc IN ('Assessment', 'CommAssessment')
    ORDER BY ClientId, ProcDate
    

    声明(如果不支持SUM (...) OVER (ORDER BY ...)):

    ;WITH GroupsCTE AS (
        SELECT 
            d.ClientId, d.ProcDate, d.ProcDesc,
            c.[Group]
        FROM #Data d
        CROSS APPLY(
            SELECT
            SUM(CASE WHEN ProcDesc IN ('Assessment', 'CommAssessment') THEN 1 ELSE 0 END) AS [Group]
            FROM #Data
            WHERE (ClientId = d.ClientId) AND (ProcDate <= d.ProcDate)
        ) c
    ), CountCTE AS (
        SELECT ClientId, [Group], COUNT(*) - 1 AS [Count]
        FROM GroupsCTE
        GROUP BY ClientId, [Group]
    )
    SELECT 
        g.ClientId, g.ProcDate, g.ProcDesc,
        c.[Count]
    FROM GroupsCTE g
    CROSS APPLY (
        SELECT ClientId, [Group], COUNT(*) - 1 AS [Count]
        FROM GroupsCTE
        WHERE (ClientId = g.ClientId) AND ([Group] = g.[Group])
        GROUP BY ClientId, [Group]
    ) c
    WHERE g.ProcDesc IN ('Assessment', 'CommAssessment')
    ORDER BY g.ClientId, g.ProcDate, CASE WHEN g.ProcDesc IN ('Assessment', 'CommAssessment') THEN 0 ELSE 1 END, g.ProcDesc
    

    输出:

    ClientId    ProcDate    ProcDesc    (No column name)
    2584        2018-10-03  Assessment  2
    2584        2018-11-04  Assessment  5
    6581        2018-11-04  CommAssessment  3
    7180        2018-06-13  Assessment  6
    7180        2018-07-04  Assessment  5
    7180        2018-09-05  Assessment  5
    7585        2018-11-04  Assessment  5
    

    【讨论】:

    • 您好,我收到此错误:未启用并行数据仓库 (PDW) 功能。我正在使用 SQL SERVER 2008 R2。
    • 在 google 上阅读后,SQL Server 2008 R2 中的聚合似乎不支持 OVER ... ORDER BY。因此我收到错误消息,还有其他选择吗?谢谢
    • @jk1844 我已经更新了答案。奇怪,但根据文档,从 SQL Server 2008 开始应该支持 SUM OVER 和 'COUNT OVER'。
    • 这很好用非常感谢。我还需要将每次评估中第一个 POC 和最后一个 POC 之间的持续时间(以周为单位)计算到一个持续时间列中。
    【解决方案3】:

    样本数据:

    declare @Test table
        (ClientId int, ProcDate date,ProcDesc varchar(20))
    insert @Test values
    (7180,'2018-06-13','Assessment'),
    (7180,'2018-06-13','POC 20'),
    (7180,'2018-06-13','POC 4b'),
    (7180,'2018-06-20','POC 20'),
    (7180,'2018-06-20','POC 4b'),
    (7180,'2018-06-27','POC 20'),
    (7180,'2018-06-27','POC 4b'),
    (7180,'2018-07-04','Assessment'),
    (7180,'2018-07-04','POC 20'),
    (7180,'2018-07-04','POC 4b'),
    (7180,'2018-07-11','POC 20'),
    (7180,'2018-07-18','POC 20'),
    (7180,'2018-07-18','POC 4b'),
    (7180,'2018-09-05','Assessment'),
    (7180,'2018-09-05','POC 20'),
    (7180,'2018-09-12','POC 20'),
    (7180,'2018-09-12','POC 4b'),
    (7180,'2018-09-19','POC 20'),
    (7180,'2018-09-19','POC 4b'),
    (2584,'2018-10-03','Assessment'),
    (2584,'2018-10-03','POC 20'),
    (2584,'2018-10-03','POC 4b'),
    (2584,'2018-11-04','Assessment'),
    (2584,'2018-11-04','POC 20'),
    (2584,'2018-11-04','POC 4b'),
    (2584,'2018-11-11','POC 20'),
    (2584,'2018-11-18','POC 20'),
    (2584,'2018-11-18','POC 4b'),
    (7585,'2018-11-04','Assessment'),
    (7585,'2018-11-04','POC 20'),
    (7585,'2018-11-04','POC 4b'),
    (7585,'2018-11-11','POC 20'),
    (7585,'2018-11-18','POC 20'),
    (7585,'2018-11-18','POC 4b'),
    (1585,'2018-11-04','CommAssessment'),
    (1585,'2018-11-04','POC 20'),
    (1585,'2018-11-04','POC 4b'),
    (1585,'2018-11-11','POC 20'),
    (1585,'2018-11-18','POC 20'),
    (1585,'2018-11-18','POC 4b')
    

    建议的“老派”解决方案:
    已编辑:我忘记了“ClientId”
    Fiddle

    select ClientId, AssessDate = ProcDate, [Type] = ProcDesc, CountPOC = (
            select count(*)
            from @Test
            where ProcDesc like 'POC%' --- <> 'Assessment'
                and ClientId = t.ClientId
                and ProcDate >= t.ProcDate
                and ProcDate < isNull((
                        select top 1 ProcDate
                        from @Test
                        where ClientId = t.ClientId
                            and ProcDate > t.ProcDate
                            and ProcDesc not like 'POC%' -- = 'Assessment'
                        order by ProcDate
                        ), getdate())
            )
    from @Test t
    where t.ProcDesc not like 'POC%' -- = 'Assessment'
    

    【讨论】:

    • 您好,还有其他客户的评估日期相同,因此计数不正确。我在您的插入语句中添加了额外的行以显示计数。您能否调整您的查询以解决此问题。
    • 您好,我还有另一个 ProcType CommAssessment,这也可以考虑吗?我已经编辑了我的原始帖子。谢谢。
    • 现在您可以添加新的“ProcType”,而可数描述以“POC”开头
    猜你喜欢
    • 2020-07-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-03-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多