【问题标题】:Count distinct values with OVER(PARTITION BY id)使用 OVER(PARTITION BY id) 计算不同的值
【发布时间】:2026-01-07 04:05:02
【问题描述】:

是否可以结合 OVER(PARTITION BY id) 之类的窗口函数计算不同的值?目前我的查询如下:

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
   congestion.id_element,
ROW_NUMBER() OVER(
    PARTITION BY congestion.id_element
    ORDER BY congestion.date),
COUNT(DISTINCT congestion.week_nb) OVER(
    PARTITION BY congestion.id_element
) AS week_count
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date

但是,当我尝试执行查询时,出现以下错误:

"COUNT(DISTINCT": "DISTINCT is not implemented for window functions"

【问题讨论】:

    标签: postgresql window-functions


    【解决方案1】:

    不,如错误消息所述,DISTINCT 未使用 Windows 功能实现。将来自 this link 的信息应用到您的案例中,您可以使用以下内容:

    WITH uniques AS (
     SELECT congestion.id_element, COUNT(DISTINCT congestion.week_nb) AS unique_references
     FROM congestion
    WHERE congestion.date >= '2014.01.01'
    AND congestion.date <= '2014.12.31'
     GROUP BY congestion.id_element
    )
    
    SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
       congestion.id_element,
    ROW_NUMBER() OVER(
        PARTITION BY congestion.id_element
        ORDER BY congestion.date),
    uniques.unique_references AS week_count
    FROM congestion
    JOIN uniques USING (id_element)
    WHERE congestion.date >= '2014.01.01'
    AND congestion.date <= '2014.12.31'
    ORDER BY id_element, date
    

    根据具体情况,您还可以将子查询直接放入SELECT-list:

    SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
       congestion.id_element,
    ROW_NUMBER() OVER(
        PARTITION BY congestion.id_element
        ORDER BY congestion.date),
    (SELECT COUNT(DISTINCT dist_con.week_nb)
        FROM congestion AS dist_con
        WHERE dist_con.date >= '2014.01.01'
        AND dist_con.date <= '2014.12.31'
        AND dist_con.id_element = congestion.id_element) AS week_count
    FROM congestion
    WHERE congestion.date >= '2014.01.01'
    AND congestion.date <= '2014.12.31'
    ORDER BY id_element, date
    

    【讨论】:

      【解决方案2】:

      我发现最简单的方法是使用子查询/CTE 和条件聚合:

      SELECT c.date, c.week_nb, c.id_congestion, c.id_element,
             ROW_NUMBER() OVER (PARTITION BY c.id_element ORDER BY c.date),
             (CASE WHEN seqnum = 1 THEN 1 ELSE 0 END) as week_count
      FROM (SELECT c.*,
                   ROW_NUMBER() OVER (PARTITION BY c.congestion.id_element, c.week_nb
                                      ORDER BY c.date) as seqnum
            FROM congestion c
           ) c
      WHERE c.date >= '2014.01.01' AND c.date <= '2014.12.31'
      ORDER BY id_element, date
      

      【讨论】:

        【解决方案3】:

        使分区集更小,直到计数字段没有重复:

        SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
           congestion.id_element,
        ROW_NUMBER() OVER(
            PARTITION BY congestion.id_element
            ORDER BY congestion.date),
        COUNT(congestion.week_nb) -- remove distinct 
        OVER(
            PARTITION BY congestion.id_element,
                         -- add new fields which will restart counter in case duplication
                         congestion.id_congestion
        ) AS week_count
        FROM congestion
        WHERE congestion.date >= '2014.01.01'
        AND congestion.date <= '2014.12.31'
        ORDER BY id_element, date
        

        【讨论】:

        • 我不确定这个答案是否总是普遍适用,但经过一些人认为它非常适合我的用例。
        【解决方案4】:

        由于这是从 Google 弹出的第一个结果,我将添加这个可重现的示例,类似于 Gordon 的回答:

        让我们首先从创建一个示例表开始:

        WITH test as 
        (
        SELECT * 
        FROM (VALUES
        (1, 'A'),
        (1, 'A'),
        (2, 'B'),
        (2, 'B'),
        (2, 'D'),
        (3, 'C'),
        (3, 'C'),
        (3, 'C'),
        (3, 'E'),
        (3, 'F')) AS t (id_element, week_nb)
        )
        
        select * from test
        

        这会产生:

        id_element week_nb
        1   A
        1   A
        2   B
        2   B
        2   D
        3   C
        3   C
        3   C
        3   E
        3   F
        

        然后,做类似的事情:

        select 
          id_element,
          week_nb,
          sum(first_row_in_sequence) over (partition by id_element) as distinct_week_nb_count
        from 
        (
        select 
          id_element,
          week_nb,
          case when row_number() over (partition by id_element, week_nb) = 1 then 1 else 0 end as first_row_in_sequence
        from test
        ) as sub
        

        产量

        id_element week_nb distinct_week_nb_count
        1   A   1
        1   A   1
        2   B   2
        2   B   2
        2   D   2
        3   C   3
        3   C   3
        3   C   3
        3   E   3
        3   F   3
        

        【讨论】:

        • 感谢您的解释,这对我来说效果很好
        【解决方案5】:

        如果您计算不同的数字,您可以使用其他聚合函数来实现相同的效果,就像这样。

        select
            initial.id,
            initial.val,
            joined.id,
            array_length(uniq(sort(array_agg(joined.some_number) over (partition by initial.id))), 1) as distinct_count
        from
            (values (1,'a'), (2,'b'), (3,'c')) initial(id, val)
                left join (values (1, 1),
                                  (1, 1),
                                  (1, 3),
                                  (2, 2),
                                  (2, 2),
                                  (3, 3),
                                  (3, 3),
                                  (3, 3),
                                  (3, 4)) joined(id, some_number) on joined.id = initial.id
        ;
        
        
        id  val id  distinct_count
        1   a   1   2
        1   a   1   2
        1   a   1   2
        2   b   2   1
        2   b   2   1
        3   c   3   2
        3   c   3   2
        3   c   3   2
        3   c   3   2
        
        

        【讨论】: