【问题标题】:reducer gets stuck at 99% while inserting the data in hbase through hive在通过 hive 将数据插入 hbase 时,reducer 卡在 99%
【发布时间】:2013-10-04 10:10:48
【问题描述】:

我有 2 个节点的 hbase 集群在 amazon-ec2(hadoop 1.0.1, hive-0.11.0, hbase-0.94.11,zookeeper-3.4.3) 上运行,并使用 ami-2.4.1 在 EMR 节点上创建。

所以在 EMR 实例上,我有一个指向 S3 上某个位置的外部表。另外,我还创建了一个hbase-hive table (modelvarlarge, modelval)。现在,我试图将数据从logdata 插入到modelvar

但是,reducer 阶段卡在 99% 并导致以下错误: 仅供参考,通过 zkcli,我能够从 EMR 连接到 Ec2 zookeeper。

外部表:

create external table logdata(date_local string, time_local string,s_computername string,
    c_ip string,s_ip string,s_port string,s_sitename string, referer string, localfile string, 
    TimeTakenMS string, status string, w3status string, sc_substatus string, uri string, qs string, 
    sc_bytes string, cs_bytes string, cs_username string, cs_User_Agent string, s_proxy string, c_protocol string, 
    cs_version string, cs_method string, cs_Cookie string, cs_Host string, w3wpbytes string, RequestsPerSecond string, 
    CPU_Utilization string, BeginRequest_UTC string, EndRequest_UTC string, time string, logdate string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' location 's3://xxxxxxxxx';

Hbase-Hive 表

    CREATE TABLE modelvar(cookie string, pageviews string, visit string) 
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = "m:pageviews,m:visit")
    TBLPROPERTIES ("hbase.table.name"="modelvarlarge");

查询: 插入表模型变量 选择 x.cookie,点击,访问 from (select cs_Cookie as Cookie, count(*) as hits from logdata where (uri like '%.aspx%' or uri like '%.html%') group by cs_Cookie)x join (select cs_Cookie as Cookie, count(distinct cs_Cookie) 作为来自 cs_Cookie 的 logdata 组的访问)y on x.cookie=y.cookie order by hits desc;

错误

java\.lang\.RuntimeException: org\.apache\.hadoop\.hive\.ql\.metadata\.HiveException: Hive Runtime Error while processing row (tag\=0) {\"key\":{\"reducesinkkey0\":24655},\"value\":{\"_col0\":\"-\",\"_col1\":24655,\"_col2\":17},\"alias\":0}
at org\.apache\.hadoop\.hive\.ql\.exec\.ExecReducer\.reduce(ExecReducer\.java:278)
at org\.apache\.hadoop\.mapred\.ReduceTask\.runOldReducer(ReduceTask\.java:528)
at org\.apache\.hadoop\.mapred\.ReduceTask\.run(ReduceTask\.java:429)
at org\.apache\.hadoop\.mapred\.Child$4\.run(Child\.java:255)
at java\.security\.AccessController\.doPrivileged(Native Method)
at javax\.security\.auth\.Subject\.doAs(Subject\.java:415)
at org\.apache\.hadoop\.security\.UserGroupInformation\.doAs(UserGroupInformation\.java:1132)
at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:249)
Caused by: org\.apache\.hadoop\.hive\.ql\.metadata\.HiveException: Hive Runtime Error while processing row (tag\=0) {\"key\":{\"reducesinkkey0\":24655},\"value\":{\"_col0\":\"-\",\"_col1\":24655,\"_col2\":17},\"alias\":0}
at org\.apache\.hadoop\.hive\.ql\.exec\.ExecReducer\.reduce(ExecReducer\.java:266)
\.\.\. 7 more
Caused by: org\.apache\.hadoop\.hive\.ql\.metadata\.HiveException: java\.io\.IOException: org\.apache\.hadoop\.hbase\.client\.HConnectionManager$HConnectionImplementation@10f00d3 closed
at org\.apache\.hadoop\.hive\.ql\.io\.HiveFileFormatUtils\.getHiveRecordWriter(HiveFileFormatUtils\.java:241)
at org\.apache\.hadoop\.hive\.ql\.exec\.FileSinkOperator\.createBucketFiles(FileSinkOperator\.java:539)
at org\.apache\.hadoop\.hive\.ql\.exec\.FileSinkOperator\.processOp(FileSinkOperator\.java:621)
at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:502)
at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:832)
at org\.apache\.hadoop\.hive\.ql\.exec\.SelectOperator\.processOp(SelectOperator\.java:84)
at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:502)
at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:832)
at org\.apache\.hadoop\.hive\.ql\.exec\.ExtractOperator\.processOp(ExtractOperator\.java:45)
at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:502)
at org\.apache\.hadoop\.hive\.ql\.exec\.ExecReducer\.reduce(ExecReducer\.java:257)
\.\.\. 7 more
Caused by: java\.io\.IOException: org\.apache\.hadoop\.hbase\.client\.HConnectionManager$HConnectionImplementation@10f00d3 closed
at org\.apache\.hadoop\.hbase\.client\.HConnectionManager$HConnectionImplementation\.locateRegion(HConnectionManager\.java:794)
at org\.apache\.hadoop\.hbase\.client\.HConnectionManager$HConnectionImplementation\.locateRegion(HConnectionManager\.java:782)
at org\.apache\.hadoop\.hbase\.client\.HTable\.finishSetup(HTable\.java:249)
at org\.apache\.hadoop\.hbase\.client\.HTable\.(HTable\.java:213)
at org\.apache\.hadoop\.hbase\.client\.HTable\.(HTable\.java:171)
at org\.apache\.hadoop\.hive\.hbase\.HiveHBaseTableOutputFormat\.getHiveRecordWriter(HiveHBaseTableOutputFormat\.java:82)
at org\.apache\.hadoop\.hive\.ql\.io\.HiveFileFormatUtils\.getRecordWriter(HiveFileFormatUtils\.java:250)
at org\.apache\.hadoop\.hive\.ql\.io\.HiveFileFormatUtils\.getHiveRecordWriter(HiveFileFormatUtils\.java:238)
\.\.\. 17 more

【问题讨论】:

    标签: hadoop amazon-ec2 hbase hive apache-zookeeper


    【解决方案1】:

    您需要在整个 EMR 集群中定义主机和 ip 映射。假设您在 Ec2 上使用 3 节点 hbase 集群,它们的 ips 是

     ip1, ip2, ip3
    

    我们在 ec2 hbase 集群的主机文件中给出了这样的别名:

    ip1 master
    ip2 rgserver1
    ip3 rgserver3
    

    因此,在每个 EMR 节点的主机文件中,您还需要定义类似于上面的映射。否则无法将数据写入hbase集群。

    【讨论】:

      猜你喜欢
      • 2013-06-13
      • 2013-09-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-06-10
      • 2017-03-01
      相关资源
      最近更新 更多