【问题标题】:Pig Latin distinguishing Map or Reduce queriesPig Latin 区分 Map 或 Reduce 查询
【发布时间】:2016-03-26 06:44:43
【问题描述】:

我有以下数据样本:

AGE,EDU,SEX,SALARY
67,10th,Male,<=50K
17,10th,Female,<=50K
40,Assoc-voc,Male,>50K
35,Assoc-voc,Male,<=50K
57,Assoc-voc,Male,<=50K
49,Assoc-voc,Male,>50K
42,Bachelors,Male,>50K
30,Bachelors,Male,>50K
23,Bachelors,Female,<=50K

===============================================

我创建了以下 Pig Latin/hadoop 脚本:

sensitive = LOAD '/mdsba' using PigStorage(',') as (AGE,EDU,SEX,SALARY);
    *--Filtered the data by the salary
    Data_filter1 = FILTER sensitive by (SALARY matches '<=50K');
    Data_filter2 = FILTER sensitive by (SALARY matches '>50K');
    --group both filters
    B= foreach(group Data_filter1 by(AGE,EDU,SEX)) 
    generate Data_filter1;
    C= foreach(group Data_filter2 by(AGE,EDU,SEX)) 
    generate Data_filter2;
    Dump B ;
    Dump C ;

================================================ ===============

有什么方法可以确定查询 B、C、Data_filter1 或 Data_filter2 是否在 Map 或 Reduce 进程上运行。由于在作业结束时会生成以下报告:

Elapsed: 35sec  
Diagnostics: 
 Average Map Time: 12sec  
 Average Shuffle Time: 10sec  
 Average Merge Time: 0sec  
 Average Reduce Time: 2sec 

非常感谢

【问题讨论】:

    标签: hadoop mapreduce apache-pig


    【解决方案1】:

    是的,当您启动作业时,您会看到一个字符串

     org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: Alias1[73,14] C: Alias2[20, 9] R: Alias3[90, 78]
    

    M代表mapper,C代表combiner,R代表reducer。但在一般情况下,您的查询可能会同时在 mapper 和 reducer 上运行

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2015-01-04
      • 2023-03-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-07-07
      • 1970-01-01
      相关资源
      最近更新 更多