【问题标题】:How to get the Name and DeptId of the employees whose salary is greater than average of his department?如何获取工资高于其部门平均水平的员工的姓名和部门ID?
【发布时间】:2018-04-16 21:47:49
【问题描述】:

我是 Hadoop 和 pig 的新手。根据问题,我可以深入到下面的脚本,但是我如何将人的薪水与其部门的平均薪水进行比较。以下是获取各部门平均工资的脚本

A = LOAD 'Assignment_1_Input.log' USING PigStorage('\t') as (id:int,name:chararray,age:int,salary:int,deptid:int);
B = GROUP A by deptid;
STORE B INTO 'Assign1GrpByNew';
C = FOREACH B GENERATE group as grpId,AVG(A.salary) as grpAvgSal;
DUMP C;

输入文件:

15878   mohan   24      8000    1
19173   ramya   27      10000   1
9527    krishna 35      40000   2
9528    raj     36      60000   2
16884   ravi    50      70000   2

预期输出

ramya   1
raj 2
ravi    2

帮帮我,谢谢

【问题讨论】:

    标签: bigdata apache-pig hadoop2


    【解决方案1】:

    JOIN A 和 C by deptid,grpId 和 FILTER where 薪水 > grpAvgSal

    A = LOAD 'Assignment_1_Input.log' USING PigStorage('\t') as (id:int,name:chararray,age:int,salary:int,deptid:int);
    B = GROUP A by deptid;
    STORE B INTO 'Assign1GrpByNew';
    C = FOREACH B GENERATE group as grpId,AVG(A.salary) as grpAvgSal;
    
    D = JOIN A BY deptid,C BY grpId;
    E = FILTER D BY (A::salary > C::grpAvgSal);
    DUMP E;
    

    【讨论】:

      【解决方案2】:

      GROUP BY dept_id 并计算每个员工记录的平均工资,并选择工资大于平均的员工。

      片段:

      inp_data = LOAD 'Assignment_1_Input.log' USING PigStorage('\t') as (id:int,name:chararray,age:int,salary:int,deptid:int);
      inp_data_fmt = FOREACH(GROUP inp_data BY deptid) GENERATE FLATTEN(inp_data), AVG(inp_data.salary) AS avg_salary;
      req_data = FILTER inp_data_fmt BY salary > avg_salary;
      DUMP req_data;
      

      【讨论】:

        猜你喜欢
        • 2022-01-22
        • 2012-08-02
        • 2019-01-25
        • 2020-08-09
        • 2021-09-09
        • 2011-04-28
        • 1970-01-01
        • 2023-03-22
        • 2022-11-10
        相关资源
        最近更新 更多