【问题标题】:Count/Sum in Apache PigApache Pig 中的计数/求和
【发布时间】:2014-09-18 22:42:50
【问题描述】:

我是 Apache Pig 的初学者。有一个包含以下字段的表:

table - amount:long date:string country:string

最初,我的目标是按月计算每个国家/地区的字段数量。例如,这将是我需要的最终结果:

(Exhibit A)
201201 USA 100
201201 UK 150
201305 ITALY 200
201305 USA 120
201305 UK 20
201403 ITALY 300

数字 100,150,200,300 代表所有国家/地区的每个日期的金额计数。为此,我编写了以下猪脚本。达到了上述预期效果。

data = ORDER table BY date ASC;

data1 = GROUP data BY (date, country);

countof_amount = FOREACH data1 GENERATE
             FLATTEN(group) AS (date, country),
             COUNT(data) AS amount_count;

countof_amount1 = order countof_amount by date ASC;

现在,我想找到所有国家/地区每个日期的所有金额的总和,例如从图表 A,我想要以下结果:

201201 250
201305 240
201403 300

我该怎么做呢?

提前致谢!

【问题讨论】:

  • 删除了 SQL 标签,因为这是关于猪的。
  • 只按date 分组而不是(date,country)

标签: filter group-by apache-pig


【解决方案1】:

添加最后三行它将起作用。我在本地测试过,效果很好。

table = LOAD 'input.txt' using PigStorage(' ') as(amount:long,date:chararray,country:chararray);  
data = ORDER table BY date ASC;  
data1 = GROUP data BY (date,country);  
countof_amount = FOREACH data1 GENERATE 
            FLATTEN(group) AS (date, country),  
           COUNT(data.amount) AS (amount_count);  
countof_amount1 = order countof_amount by date ASC;  

mycount =  group countof_amount1 by date;  
getFinalCount = FOREACH mycount  GENERATE group as date,SUM(countof_amount1.amount_count) as total;  
dump getFinalCount; 

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-12-31
    • 1970-01-01
    • 2022-11-11
    • 2017-08-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多