【问题标题】:Hive on spark. Reading parquet file蜂巢上的火花。读取镶木地板文件
【发布时间】:2017-07-21 15:44:10
【问题描述】:

我正在尝试将 parquet 文件读入 Hive on Spark。

所以我发现我应该这样做:

CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED 
AS AVRO TBLPROPERTIES ('avro.schema.url'='/files/events/avro_events_scheme.avsc'); 

CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION '/files/events/parquet_events/';

我的 avro 方案在哪里:

{
 "type" : "parquet_file",
    "namespace" : "events",
    "name" : "events",
    "fields" : [
            { "name" : "category" , "type" : "string" },
            { "name" : "duration" , "type" : "long" },
            { "name" : "name" , "type" : "string" },
            { "name" : "user_id" , "type" : "string"},
            { "name" : "value" , "type" : "long" }
    ]
 }

结果我收到一个错误:

org.apache.spark.sql.catalyst.parser.ParseException: 
Operation not allowed: ROW FORMAT SERDE is incompatible with format 'avro', 
which also specifies a serde(line 1, pos 0)

【问题讨论】:

    标签: hadoop hive avro parquet spark-avro


    【解决方案1】:
    I think we have to add inputforamt and outputformat classes. 
    
    CREATE TABLE parquet_test
    ROW FORMAT SERDE
       'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS INPUTFORMAT  
      'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    OUTPUTFORMAT
       'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    TBLPROPERTIES (
      'avro.schema.url''avro.schema.url'='/hadoop/avro_events_scheme.avsc');
    
    I hope above would work. 
    

    【讨论】:

    • 谢谢!这有帮助,但现在有一行:CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION '/dir_to_file/file_name.parq/'; 它返回错误:SQL Error: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'LIKE' expecting {<EOF>, '(', 'SELECT', 'FROM', 'AS',... 你也可以帮忙吗?
    • 你可以参考这个链接。community.hortonworks.com/questions/5833/… set PARQUET LOCATION '/dir_to_file' ,请排除 file_name.parq。
    • 确实,我的目录中不需要 *.parq 文件,但是它返回相同的错误,指出 Hive 在 LIKE 上失败。
    • 查询是否有可能因为前面的步骤出错而返回错误?
    猜你喜欢
    • 2021-09-05
    • 1970-01-01
    • 2016-12-11
    • 2018-12-20
    • 2016-01-18
    • 2017-03-17
    • 2019-02-19
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多