插入 Hive 表 - 非分区表到分区表 - 无法插入目标表，因为列号/类型答案

【问题标题】：Inserting into Hive table - Non Partitioned table to Partitioned table - Cannot insert into target table because column number/types插入 Hive 表 - 非分区表到分区表 - 无法插入目标表，因为列号/类型
【发布时间】：2016-06-19 03:47:21
【问题描述】：

当我尝试插入分区表时，出现以下错误 SemanticException [错误 10044]：第 1:23 行无法插入目标表，因为列号/类型不同 ''US''：表 insclause-0 有 2 列，但查询有 3 列。

我的输入数据

1,aaa,US
2,bbb,US
3,ccc,IN
4,ddd,US
5,eee,IN
6,fff,IN
7,ggg,US

已创建配置单元表 tx

create table tx (no int,name string,country string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

已创建分区表 t1按国家/地区分区

create table t1 (no int,name string) PARTITIONED BY (country string)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

我尝试了下面两个插入，但失败了

    INSERT OVERWRITE TABLE t1 PARTITION (country='US') 
SELECT *   from tx where country = 'US';

    INSERT OVERWRITE TABLE t1 PARTITION (country='US') 
SELECT no,name,country from tx where country = 'US';

错误 ：SemanticException [错误 10044]：第 1:23 行无法插入目标表，因为列号/类型不同 ''US''：表 insclause-0 有 2 列，但查询有 3 列。

【问题讨论】：

RTFM -- Hive 不是 Oracle。在 Hive 中，分区“列”作为 元数据 >> 它们不包含在数据文件中，而是用作子目录名称。因此，您的分区表只有 2 个实际列，您必须使用 SELECT 仅提供 2 个列。
另一方面，如果您使用 动态分区 -- 即 INSERT ... PARTITION (country) 没有文字值 -- 那么分区“列”的实际值将不得不在实际列之后作为 SELECT 中的额外列提供。
非常感谢 Samson Scharfrichter。是的，它起作用了......我发布了正确的查询......

标签： hadoop hive

【解决方案1】：

非常感谢 Samson Scharfrichter

    INSERT OVERWRITE TABLE t1 PARTITION (country='US') 
SELECT no,name  from tx where country = 'US';
    INSERT INTO TABLE t1 PARTITION (country='IN') 
SELECT no,name  from tx where country = 'IN';

我检查了分区

hive>  SHOW PARTITIONS t1;
OK
country=IN
country=US
Time taken: 0.291 seconds, Fetched: 2 row(s)
hive>

【讨论】：

我明白了。所以基本上，不要使用 * 并且不要放置分区列本身。谢谢！