【问题标题】:Hive compute median and average by groupsHive 按组计算中位数和平均值
【发布时间】:2020-06-10 17:21:37
【问题描述】:

我有一个按州和县计算的数据集,我想按州和县计算中位数和平均值,例如:

有:

ID  state    county  count
1   MD       aa          2
2   MD       aa          4
3    VA        bb         1
4    VA        bb         2
5    VA        bb         4
6    VA        cc          7
7    VA        cc          8

想要:

到目前为止我有什么错误:

Select id,  STATE,COUNTY,count,
percentile(cast(count as BIGINT), 0.5) OVER() as overall_median, 
round(avg(count),2) OVER() as overall_avg,

percentile(cast(count as bigint),0.5) OVER(PARTITION BY id,STATE) as med_state,
percentile(cast(count as bigint),0.5) as med_county,

AVG(count) OVER (PARTITION BY id, STATE) as avg_state,
AVG(count) AS avg_county,
from have
group by id, state, county

不使用组时收到错误:

错误:执行错误:org.apache.hive.service.cli.HiveSQLException:编译语句时出错:FAILED:SemanticException 未能将窗口调用分解为组。至少 1 个组必须仅依赖于输入列。还要检查 循环依赖项。潜在错误:org.apache.hadoop.hive.ql.parse.SemanticException:第 1:457 行表达式不在 GROUP BY 键'id'

没有分组的代码:

Select id,  STATE,county,count,
percentile(cast(count as BIGINT), 0.5) OVER() as overall_median, 
round(avg(count),2) OVER() as overall_avg,

percentile(cast(count as bigint),0.5) OVER(PARTITION BY id,STATE) as med_state,
percentile(cast(count as bigint),0.5) OVER(PARTITION BY id,STATE,county) as med_county,


AVG(count) OVER (PARTITION BY id, STATE) as avg_state,
AVG(count) OVER (PARTITION BY id, STATE, county) as avg_county,
from have

谢谢!

【问题讨论】:

  • 我猜你不需要group by
  • 我试过删除它,还是不行。
  • 你能发布错误信息吗?据我所知,您在某些地方有右括号问题
  • 刚刚更新了我的帖子

标签: hadoop hive hql hiveql cloudera


【解决方案1】:

修正:round(avg(count) OVER(), 2)

    select 
        id, STATE, county, count,
        percentile(cast(count as BIGINT), 0.5) OVER() as overall_median, 
        round(avg(count) OVER(), 2) as overall_avg,

        percentile(cast(count as bigint), 0.5) OVER(PARTITION BY id, STATE) as med_state,
        percentile(cast(count as bigint), 0.5) OVER(PARTITION BY id, STATE, county) as med_county,

        AVG(count) OVER (PARTITION BY id, STATE) as avg_state,
        AVG(count) OVER (PARTITION BY id, STATE, county) as avg_county
    from 
        have

提示:不要使用关键字(即计数)作为列名 - 你将来会遇到很多问题

【讨论】:

    猜你喜欢
    • 2021-12-11
    • 2017-04-19
    • 1970-01-01
    • 2016-01-10
    相关资源
    最近更新 更多