Hive 查询抛出错误输入字符串：“__HIVE_D”不是整数答案

【问题标题】：Hive Query throwing error input string: "__HIVE_D" is not an integerHive 查询抛出错误输入字符串：“__HIVE_D”不是整数
【发布时间】：2020-06-28 22:39:00
【问题描述】：

我有一个使用 AWS Glue 元存储的 Hive 表。数据驻留在 S3 上，我们按年、月和唯一编号进行分区。

我使用 AWS EMR spark-sql 运行查询

这是一个示例表结构：

String                  Date          Int       Int        String
s3_url                  rec_dt        yr_number mth_number uniq_id
s3://path/example.txt   2020-03-16    2020      3          4195

现在每当我查询此表时，如果在命令下运行它运行良好：

select s3_url from table where (rec_dt in ('2020-03-16'))

但是，当我添加以下参数时，它会引发错误：

select s3_url from table where (rec_dt in ('2020-03-16')) and yr_number=2020;

错误

Error in query: org.apache.hadoop.hive.metastore.api.InvalidObjectException: 

For input string: "__HIVE_D" is not an integer. 

(Service: AWSGlue; Status Code: 400; Error Code: InvalidInputException; 

Request ID: 586ff8e1-8f67-4593-940d-9f992a073be3);

我也检查了表架构，列是一个 int，我也传递了一个 int 值。

【问题讨论】：

标签： amazon-web-services apache-spark hadoop hive aws-glue

【解决方案1】：

似乎这清楚地表明了配置单元目录和胶水目录结构不匹配。你说你看过带有列的 hive 目录

检查 AWS 胶水目录是否有指定数据类型的列。

检查架构详细信息的 UI 示例...

希望你使用the AWS Glue Data Catalog as the Hive metastore

【讨论】：

嗨 Ram，我在 Glue 中检查了该表的架构，它具有正确的 int 数据类型，就像它在带有描述表的 spark 中显示一样。我相信我们正在遵循使用 AWS Glue 数据目录作为 Hive 元存储的模式
你检查dynamicframe.printSchema( ) 是否和你预期的一样？
你可以运行 hive MSCK REPAIR TABLE table_name;它可能会同步
谢谢 Ram，看起来问题是我们在此表中有错误数据，有些未对齐，因此在清除这些记录并重新开始我的查询后，某些记录的类型不同。
如果有用请关注accept the answer as owner