【问题标题】:How to get schema of a hive table created in TextInput/OutputFormat using Java如何使用 Java 获取在 TextInput/OutputFormat 中创建的配置单元表的架构
【发布时间】:2019-08-06 19:14:23
【问题描述】:

如果是 avro、orc 或 parquet 表,我可以使用各自的库来获取模式。 但是如果输入/输出格式是 TXT,并且数据存储在 csv 文件中,我如何以编程方式获取架构?

谢谢,

【问题讨论】:

    标签: java hive schema


    【解决方案1】:

    您可以使用DESCRIBE 语句来显示有关表的元数据,例如列名及其数据类型。

    DESCRIBE FORMATTED 以 Apache Hive 用户熟悉的格式显示附加信息。

    示例:

    我创建了一个如下表。

    CREATE TABLE IF NOT EXISTS Employee_Local( EmployeeId INT,Name STRING, 
    Designation STRING,State STRING, Number STRING)
    ROW Format Delimited Fields Terminated by ',' STORED AS Textfile;
    

    DESCRIBE 声明

    您可以对 DESCRIBE 语句使用缩写 DESC。

    hive> DESCRIBE Employee_Local;
    OK
    employeeid              int                                         
    name                    string                                      
    designation             string                                      
    state                   string                                      
    number                  string 
    

    描述格式声明

    hive> describe formatted Employee_Local;
    OK
    # col_name              data_type               comment             
    
    employeeid              int                                         
    name                    string                                      
    designation             string                                      
    state                   string                                      
    number                  string                                      
    
    # Detailed Table Information         
    Database:               default                  
    Owner:                  cloudera                 
    CreateTime:             Fri Mar 15 10:53:35 PDT 2019     
    LastAccessTime:         UNKNOWN                  
    Protect Mode:           None                     
    Retention:              0                        
    Location:               hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee_test    
    Table Type:             MANAGED_TABLE            
    Table Parameters:        
        transient_lastDdlTime   1552672415          
    
    # Storage Information        
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
    Compressed:             No                       
    Num Buckets:            -1                       
    Bucket Columns:         []                       
    Sort Columns:           []                       
    Storage Desc Params:         
        field.delim             ,                   
        serialization.format    ,                   
    Time taken: 0.544 seconds, Fetched: 31 row(s)
    

    即使您可以从 Spark Shell 中获取 Hive 表的架构,如下所示:

    scala> spark.sql("desc formatted test_loop").collect().foreach(println)
    [policyid,bigint,null]
    [statecode,string,null]
    [county,string,null]
    [eq_site_limit,bigint,null]
    [hu_site_limit,bigint,null]
    [fl_site_limit,bigint,null]
    [fr_site_limit,bigint,null]
    [tiv_2011,bigint,null]
    [tiv_2012,double,null]
    [eq_site_deductible,double,null]
    [hu_site_deductible,double,null]
    [fl_site_deductible,double,null]
    [fr_site_deductible,double,null]
    [point_latitude,double,null]
    [point_longitude,double,null]
    [line,string,null]
    [construction,string,null]
    [point_granularity,bigint,null]
    [,,]
    [# Detailed Table Information,,]
    [Database:,default,]
    [Owner:,mapr,]
    [Create Time:,Fri May 26 17:56:04 EDT 2017,]
    [Last Access Time:,Wed Dec 31 19:00:00 EST 1969,]
    [Location:,maprfs:/user/hv2/warehouse/test_loop,]
    [Table Type:,MANAGED,]
    [Table Parameters:,,]
    [  rawDataSize,254192494,]
    [  numFiles,1,]
    [  transient_lastDdlTime,1495845784,]
    [  totalSize,251167564,]
    [  numRows,3024360,]
    [,,]
    [# Storage Information,,]
    [SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
    [InputFormat:,org.apache.hadoop.mapred.TextInputFormat,]
    [OutputFormat:,org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,]
    [Compressed:,No,]
    [Storage Desc Parameters:,,]
    [  serialization.format,1,]
    

    【讨论】:

    • @Arjun Bora..我的评论能回答你的问题吗?
    【解决方案2】:

    SHOW CREATE TABLE <my_table>;

    【讨论】:

      猜你喜欢
      • 2017-07-04
      • 2015-03-22
      • 2013-08-10
      • 2017-01-20
      • 2019-02-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多