【问题标题】:Apache Pig Quantile GroupingApache Pig 分位数分组
【发布时间】:2014-11-05 13:13:29
【问题描述】:

我正在努力寻找解决 Pig 分组问题的方法。目前我有一个看起来像的数据集;

Group | Height | Weight
  A   | 96.5   | 110.2
  B   | 88.2   | 122.5
  A   | 94.1   | 100.8
  B   | 84.1   | 115.6

我正在使用 DataFu 库中的 StreamingQuantile 方法来计算高度变量的分位数(第 25、第 50 ... 蚀刻)。目前它有效,但我还需要计算每个组的 AVG 权重 + 他们的分位数;所以它看起来像这样;

A | Quantile1 | 88.5 (height)  | 134.4 (avg weight)
A | Quantile2 | 125.3 (height) | 156.2 (avg weight)
etc.....
B | Quantile4 | 144.3 (height) | 134.2 (avg weight)

作为参考,这里是用于计算分位数的简单 Pig;

REGISTER /usr/lib/datafu-1.2.0.jar;
define Quantile datafu.pig.stats.StreamingQuantile('0.0','0.25','0.5','0.75','1.0');
A = load 'mydata';
Group_A = GROUP A BY $0;
Quant = FOREACH GROUP_A GENERATE group,Quantile(A.$1);

我是否也可以计算每个分位数和组的平均值 2 美元?

【问题讨论】:

    标签: hadoop apache-pig conditional-operator quantile


    【解决方案1】:

    我为列命名以避免混淆。

    REGISTER /usr/lib/datafu-1.2.0.jar;
    define Quantile datafu.pig.stats.StreamingQuantile('0.0','0.25','0.5','0.75','1.0');
    A = load 'mydata' using PigStorage('|') as (grouping:chararray, height:double, weight:double);
    Group_A = GROUP A BY grouping;
    Quant = FOREACH GROUP_A GENERATE group, Quantile(A.height) as q;
    joined = join Quant by group, A by grouping;
    grouped = group joined by (grouping, q)
    average = foreach grouped generate group, AVG(joined.weight);
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多