【问题标题】:Apache Gora Reducer for multi-table output with HbaseApache Gora Reducer 用于 Hbase 的多表输出
【发布时间】:2019-10-15 07:28:06
【问题描述】:

我在通过 Nutch 抓取的 Hbase 表中有少量数据。我们使用 Apache Gora 作为 ORM。我找到了很多示例(mapreduce)来处理 Hbase 中单个表中的数据。但我的问题是我必须将数据复制到多个表中(在减速器中)。如果没有 Gora,则存在一些指南,例如 this question 等。但是如何针对我的情况进行操作。

【问题讨论】:

    标签: mapreduce hbase nutch gora


    【解决方案1】:

    我从来没有按照你的要求去做,但你可能会在Gora Tutorial "Constructing the job" section 中看到答案。在那里,有一个减速器配置示例:

    /* Mappers are initialized with GoraMapper.initMapper() or 
     * GoraInputFormat.setInput()*/
    GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class
        , LogAnalyticsMapper.class, true);
    
    /* Reducers are initialized with GoraReducer#initReducer().
     * If the output is not to be persisted via Gora, any reducer 
     * can be used instead. */
    GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class);
    

    然后,不用GoraReducer.initReducerJob(),你可以配置你自己的reducer,as told following your link (if it is a correct answer)

    GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class
        , LogAnalyticsMapper.class, true);
    job.setOutputFormatClass(MultiTableOutputFormat.class);
    job.setReducerClass(MyReducer.class);
    job.setNumReduceTasks(2);
    TableMapReduceUtil.addDependencyJars(job);
    TableMapReduceUtil.addDependencyJars(job.getConfiguration());
    

    知道在前面的示例中,映射器发出 (TextLong, LongWritable) 键值,因此您的减速器将类似于 from the link you wrotethe answer

    public class MyReducer extends TableReducer<TextLong, LongWritable, Put> {
    
        private static final Logger logger = Logger.getLogger( MyReducer.class );
    
        @SuppressWarnings( "deprecation" )
        @Override
        protected void reduce( TextLong key, Iterable<LongWritable> data, Context context ) throws IOException, InterruptedException {
            logger.info( "Working on ---> " + key.toString() );
            for ( Result res : data ) {
                Put put = new Put( res.getRow() );
                KeyValue[] raw = res.raw();
                for ( KeyValue kv : raw ) {
                    put.add( kv );
                }
    
            ImmutableBytesWritable key = new ImmutableBytesWritable(Bytes.toBytes("tableName"));
            context.write(key, put);    
    
            }
        }
    }
    

    再说一次,我从来没有这样做过……所以也许行不通:\

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-03-31
      • 1970-01-01
      • 2014-06-04
      • 1970-01-01
      • 2016-09-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多