【发布时间】:2020-06-10 17:21:37
【问题描述】:
我有一个按州和县计算的数据集,我想按州和县计算中位数和平均值,例如:
有:
ID state county count
1 MD aa 2
2 MD aa 4
3 VA bb 1
4 VA bb 2
5 VA bb 4
6 VA cc 7
7 VA cc 8
想要:
到目前为止我有什么错误:
Select id, STATE,COUNTY,count,
percentile(cast(count as BIGINT), 0.5) OVER() as overall_median,
round(avg(count),2) OVER() as overall_avg,
percentile(cast(count as bigint),0.5) OVER(PARTITION BY id,STATE) as med_state,
percentile(cast(count as bigint),0.5) as med_county,
AVG(count) OVER (PARTITION BY id, STATE) as avg_state,
AVG(count) AS avg_county,
from have
group by id, state, county
不使用组时收到错误:
错误:执行错误:org.apache.hive.service.cli.HiveSQLException:编译语句时出错:FAILED:SemanticException 未能将窗口调用分解为组。至少 1 个组必须仅依赖于输入列。还要检查 循环依赖项。潜在错误:org.apache.hadoop.hive.ql.parse.SemanticException:第 1:457 行表达式不在 GROUP BY 键'id'
没有分组的代码:
Select id, STATE,county,count,
percentile(cast(count as BIGINT), 0.5) OVER() as overall_median,
round(avg(count),2) OVER() as overall_avg,
percentile(cast(count as bigint),0.5) OVER(PARTITION BY id,STATE) as med_state,
percentile(cast(count as bigint),0.5) OVER(PARTITION BY id,STATE,county) as med_county,
AVG(count) OVER (PARTITION BY id, STATE) as avg_state,
AVG(count) OVER (PARTITION BY id, STATE, county) as avg_county,
from have
谢谢!
【问题讨论】:
-
我猜你不需要
group by。 -
我试过删除它,还是不行。
-
你能发布错误信息吗?据我所知,您在某些地方有右括号问题
-
刚刚更新了我的帖子
标签: hadoop hive hql hiveql cloudera