如何使用 Java 获取在 TextInput/OutputFormat 中创建的配置单元表的架构答案

【问题标题】：How to get schema of a hive table created in TextInput/OutputFormat using Java如何使用 Java 获取在 TextInput/OutputFormat 中创建的配置单元表的架构
【发布时间】：2019-08-06 19:14:23
【问题描述】：

如果是 avro、orc 或 parquet 表，我可以使用各自的库来获取模式。但是如果输入/输出格式是 TXT，并且数据存储在 csv 文件中，我如何以编程方式获取架构？

谢谢，

【问题讨论】：

标签： java hive schema

【解决方案1】：

您可以使用DESCRIBE 语句来显示有关表的元数据，例如列名及其数据类型。

DESCRIBE FORMATTED 以 Apache Hive 用户熟悉的格式显示附加信息。

示例：

我创建了一个如下表。

CREATE TABLE IF NOT EXISTS Employee_Local( EmployeeId INT,Name STRING, 
Designation STRING,State STRING, Number STRING)
ROW Format Delimited Fields Terminated by ',' STORED AS Textfile;

DESCRIBE 声明

您可以对 DESCRIBE 语句使用缩写 DESC。

hive> DESCRIBE Employee_Local;
OK
employeeid              int                                         
name                    string                                      
designation             string                                      
state                   string                                      
number                  string

描述格式声明

hive> describe formatted Employee_Local;
OK
# col_name              data_type               comment             

employeeid              int                                         
name                    string                                      
designation             string                                      
state                   string                                      
number                  string                                      

# Detailed Table Information         
Database:               default                  
Owner:                  cloudera                 
CreateTime:             Fri Mar 15 10:53:35 PDT 2019     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee_test    
Table Type:             MANAGED_TABLE            
Table Parameters:        
    transient_lastDdlTime   1552672415          

# Storage Information        
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:         
    field.delim             ,                   
    serialization.format    ,                   
Time taken: 0.544 seconds, Fetched: 31 row(s)

即使您可以从 Spark Shell 中获取 Hive 表的架构，如下所示：

scala> spark.sql("desc formatted test_loop").collect().foreach(println)
[policyid,bigint,null]
[statecode,string,null]
[county,string,null]
[eq_site_limit,bigint,null]
[hu_site_limit,bigint,null]
[fl_site_limit,bigint,null]
[fr_site_limit,bigint,null]
[tiv_2011,bigint,null]
[tiv_2012,double,null]
[eq_site_deductible,double,null]
[hu_site_deductible,double,null]
[fl_site_deductible,double,null]
[fr_site_deductible,double,null]
[point_latitude,double,null]
[point_longitude,double,null]
[line,string,null]
[construction,string,null]
[point_granularity,bigint,null]
[,,]
[# Detailed Table Information,,]
[Database:,default,]
[Owner:,mapr,]
[Create Time:,Fri May 26 17:56:04 EDT 2017,]
[Last Access Time:,Wed Dec 31 19:00:00 EST 1969,]
[Location:,maprfs:/user/hv2/warehouse/test_loop,]
[Table Type:,MANAGED,]
[Table Parameters:,,]
[  rawDataSize,254192494,]
[  numFiles,1,]
[  transient_lastDdlTime,1495845784,]
[  totalSize,251167564,]
[  numRows,3024360,]
[,,]
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
[InputFormat:,org.apache.hadoop.mapred.TextInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]

【讨论】：

@Arjun Bora..我的评论能回答你的问题吗？

【解决方案2】：

SHOW CREATE TABLE <my_table>;

【讨论】：