【问题标题】:Filling in missing values with a median in postgres在postgres中用中位数填充缺失值
【发布时间】:2020-02-13 19:16:06
【问题描述】:

在此如何用中位数计算替换 avg?

select *
, coalesce(val, avg(val) over (order by t rows between 3 preceding and 1 preceding)) as fixed
from (
    values
    (1, 10),
    (2, NULL),
    (3, 10),
    (4, 15),
    (5, 11),
    (6, NULL),
    (7, NULL),
    (8, NULL),
    (9, NULL)
) as test(t, val)
;

这有合法的版本吗?

percentile_cont(0.5) within group(order by val) over (order by t rows between 3 preceding and 1 preceding)

【问题讨论】:

标签: sql postgresql window-functions median


【解决方案1】:

很遗憾,percentile_cont() 是一个聚合函数,没有等效的窗口函数。

一种解决方法是使用内联子查询进行聚合计算。

如果ids 一直在增加,那么你可以这样做:

select 
    t.*,
    coalesce(
        t.val, 
        (
            select percentile_cont(0.5) within group(order by t1.val)
            from test t1
            where t1.id between t.id - 3 and t.id - 1
        )
    ) fixed
from test t

否则,您需要额外的嵌套级别:

select 
    t.*,
    coalesce(
        t.val, 
        (
            select percentile_cont(0.5) within group(order by t1.val)
            from (select val from test t1 where t1.id < t.id order by t1.id desc limit 3) t1
        )
    ) fixed
from test t

Demo on DB Fiddle - 两个查询都产生:

编号 |值 |固定的 -: | ---: | :---- 1 | 10 | 10 2 | | 10 3 | 10 | 10 4 | 15 | 15 5 | 11 | 11 6 | | 11 7 | | 13 8 | | 11 9 | |

【讨论】:

  • 看起来不错。您是否期望子查询和自制窗口函数之间存在效率差异?如果缺少数据很少,子查询显然是最好的。
猜你喜欢
  • 2021-10-03
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-10-25
  • 2021-12-25
  • 2020-02-20
相关资源
最近更新 更多