【问题标题】:How to flatten recursive hierarchy using Hive/Pig/MapReduce如何使用 Hive/Pig/MapReduce 展平递归层次结构
【发布时间】:2016-09-15 20:10:49
【问题描述】:

我有以表格格式存储的不平衡树数据,例如:

parent,child
a,b
b,c
c,d
c,f
f,g

树的深度未知。

如何展平这个层次结构,其中每一行包含从叶节点到根节点的完整路径:

leaf node, root node, intermediate nodes
d,a,d:c:b
f,a,e:b

对使用 hive、pig 或 mapreduce 解决上述问题有什么建议吗?提前致谢。

【问题讨论】:

    标签: hadoop mapreduce hive apache-pig


    【解决方案1】:

    我尝试使用 pig 解决它,这里是示例代码:

    加入功能:

    -- Join parent and child
    Define join_hierarchy ( leftA, source, result) returns output {
        joined= join $leftA by parent left, $source by child;
        tmp_filtered= filter joined by source::parent is null;
        part= foreach tmp_filtered leftA::child as child, leftA::path as path;
        $result= union part, $result;
        part_remaining= filter joined by source::parent is not null;
        $output= foreach part_remaining generate $leftA::child as child, source::parent as parent, concat(concat(source::parent,':'),$leftA::path)
     }
    

    加载数据集:

    --My dataset field delimiter is ','.    
    source= load '*****' using pigStorage(',') as (parent:chararray, child:chararray);
    --create additional column for path
    leftA= foreach source generate child, parent, concat(parent,':');  
    
    --initially result table will be blank.
    result= limit leftA 1;
    result= foreach result generate '' as child , '' as parent;
    --Flatten hierarchy to 4 levels. Add below lines equivalent to hierarchy depth.
    
    leftA= join_hierarchy(leftA, source, result);
    leftA= join_hierarchy(leftA, source, result);
    leftA= join_hierarchy(leftA, source, result);
    leftA= join_hierarchy(leftA, source, result);
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-12-25
      • 1970-01-01
      • 1970-01-01
      • 2018-08-25
      • 1970-01-01
      • 2013-11-22
      • 2017-02-04
      相关资源
      最近更新 更多