无法将数据从 Hive 加载到 ElasticSearch答案

【问题标题】：Fail to load data from Hive to ElasticSearch无法将数据从 Hive 加载到 ElasticSearch
【发布时间】：2016-08-31 16:46:30
【问题描述】：

我目前正在尝试将数据从 Hive 加载到 ElasticSearch。我正在使用 cloudera CDH 5.3。我已经将 hadoop-es hive 2.0.2 jar 添加到我的 hive 路径中。我已在 10.44.162.169 上启动并运行 ElasticSearch 1.4.4。

我现在有一个名为 hive_cdr 的表，具有以下属性：

 traffic_type_id (big int)
 appelant (int)
 called_number (int)
 call_duration (int)
 location_number (string)
 date_heure_appel(string)

我正在尝试在我的配置单元中定义 ES 表以加载一些数据。为此，我已经这样做了：

CREATE EXTERNAL TABLE es_hive_cdr (
traffic bigint ,
calling int ,
called int ,
duration int ,
location string ,
date string )
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
'es.nodes'='10.44.162.169',
'es.resource'='indexCDR/typeCDR'
) ;

但是，我收到这个异常，说 EsStorage 无法识别。

我已删除 EsStorage 行并执行以试图找出发生了什么。

现在尝试将数据从我的 hive_cdr 表加载到我的新表中：

insert into table es_hive_cdr2
select
traffic_type_id,
appelant,
called_number,
call_duration,
location_number,
date_heure_appel
from hive_cdr;

但它失败了，我收到了这个错误：

处理语句时出错：FAILED：执行错误，从 org.apache.hadoop.hive.ql.exec.mr.MapRedTask 返回代码 2

阶段依赖性：

  Stage-1 is a root stage
  Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5
  Stage-4
  Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
  Stage-2 depends on stages: Stage-0
  Stage-3
  Stage-5
  Stage-6 depends on stages: Stage-5

舞台计划：

  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: hive_cdr
            Statistics: Num rows: 267130 Data size: 58768736 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: traffic_type_id (type: bigint), appelant (type: int), called_number (type: int), call_duration (type: int), location_number (type: string), date_heure_appel (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
              Statistics: Num rows: 267130 Data size: 58768736 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 267130 Data size: 58768736 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde: org.elasticsearch.hadoop.hive.EsSerDe
                    name: default.es_hive_cdr2

  Stage: Stage-7
    Conditional Operator

  Stage: Stage-4
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://master:8020/user/hive/warehouse/es_hive_cdr2/.hive-staging_hive_2015-03-02_14-09-08_285_4734041865540737822-2/-ext-10000

  Stage: Stage-0
    Move Operator
      tables:
          replace: false
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.elasticsearch.hadoop.hive.EsSerDe
              name: default.es_hive_cdr2

  Stage: Stage-2
    Stats-Aggr Operator

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.elasticsearch.hadoop.hive.EsSerDe
                  name: default.es_hive_cdr2

  Stage: Stage-5
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.elasticsearch.hadoop.hive.EsSerDe
                  name: default.es_hive_cdr2

  Stage: Stage-6
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://master:8020/user/hive/warehouse/es_hive_cdr2/.hive-staging_hive_2015-03-02_14-09-08_285_4734041865540737822-2/-ext-10000

我真的需要一些帮助和指导，并感谢你！

【问题讨论】：

我已经在我的 hive 的 ES 之上定义了一个外部表来写入它： ADD JAR /usr/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-hive-2.0 .2.罐子；创建外部表es_cdr（id bigint，调用int，调用int，持续时间int，位置字符串，日期字符串）由'org.elasticsearch.hadoop.hive.EsStorageHandler'TBLPROPERTIES（'es.nodes'='10.44.162.169'， 'es.resource' = 'indexOmar/typeOmar');
现在，我想从我已经通过表 metastore 下的 CSV 文件创建的表中加载 som 数据： INSERT OVERWRITE TABLE es_cdr select NULL, h.appelant, h. called_number, h .call_duration、h.location_number、h.date_heure_appel 来自 hive_cdr h；但是，这样做时，出现此错误：与服务器通信时出现问题：Job application_1425022073701_0029 introuvable。
我正在使用 cloudera manager CDH 5.3 ElasticSearch 1.4.4 ES-Hadoop hive 2.0.2 jar
现在我收到此错误：Votre requête comporte les erreurs suivantes：处理语句时出错：FAILED：执行错误，从 org.apache.hadoop.hive.ql.exec.mr 返回代码 2 .MapRedTask ps：我什么都没改！
现在，如果我尝试在 hive 下使用 centos 命令行执行此操作：我有这个错误：RuntimeException MetaException(message:java.lang.ClassNotFoundException Class org.elasticsearch.hadoop.hive.EsSerDe not found ) ps：我已经添加了：ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe' 行，错误仍然存在

标签： hadoop elasticsearch hive cloudera

【解决方案1】：

尝试给出表格属性。

TBLPROPERTIES('es.resource' = 'myviews/myview', 'es.nodes' = 'hostname-of-es-cluster', 'es.port' = '9200', 'es.input.json' = 'false', 'es.write.operation' = 'index', 'es.index.auto.create' = 'yes','es.nodes.wan.only' = 'true');

还将您的 elasticsearch.yml 文件中的属性更改为低于 1

network.host: _site_

【讨论】：

如果要将完全相同的集合从一个表转储到另一个表，请确保 hive 和弹性搜索表中的列数相同。