【问题标题】：loading data to hive static partition table using load command使用 load 命令将数据加载到 hive 静态分区表
【发布时间】：2016-09-23 18:01:30
【问题描述】：

如果它是一个非常基本的，请不要介意：

test.txt

1 拉维 100 液压
2 克里希纳 200 水力
3 fff 300 秒

我在 hive 中创建了一个表，并在 city 上进行了分区，并加载了如下数据：

create external table temp(id int, name string, sal int) 
partitioned by(city string) 
location '/testing';

load data inpath '/test.txt' into table temp partition(city='hyd');

在 HDFS 中，结构是 /testing/temp/city=hyd/test.txt

当我查询表为“select * from temp”时；

输出：

temp.id temp.name temp.sal temp.city  
    1   ravi    100 hyd  
    2   krishna 200 hyd  
    3   fff     300 hyd

我的问题是，为什么第三行的“sec”中的城市名称在输出中变为“hyd”？

我有什么问题吗？

提前致谢！！！

【问题讨论】：

标签： hadoop hive hiveql hadoop2

【解决方案1】：

你的问题是这样的：

load data inpath '/test.txt' into table temp partition(city='hyd');

您加载到此分区中的所有数据都使用 city = 'hyd'。如果您正在执行静态分区，则您有责任将正确的值放入分区中。

只需从 txt 文件中删除最后一行，将其放入 test2.txt 并执行：

load data inpath '/test.txt' into table temp partition(city='hyd');
load data inpath '/test2.txt' into table temp partition(city='sec');

是的，不是很舒服，但是静态分区以这种方式工作。

【讨论】：

谢谢 ozwlz5rd，我的要求是假设我有一个大文件并且我想在城市上做静态分区。就像上面的文件。我们如何进行？
你所问的看起来非常接近动态分区。使用它你会得到你想要的。如果您必须使用静态分区，您可以在将文件添加到分区之前对其进行处理，或者您可以创建一个临时外部表，允许您选择记录以粘贴到正确的分区中。 "city" 看起来像一个低基数字段，动态分区可以很好地处理它。

【解决方案2】：

我希望分区不能与单个文件的加载语句一起正常工作。
相反，我们需要写入 hive 中的临时表 (stat_parti)，然后我们需要从那里写入另一个分区表 (stat_test)

前：

create external table stat_test(id int, name string, sal int)
partitioned by(city string) 
row format delimited fields 
terminated by ' ' 
location '/user/test/stat_test';

并且可以给出静态或动态分区。

1) 静态分区

insert into table stat_test partition(city='hyd') select id,name,sal from stat_parti where city='hyd';  
insert into table stat_test partition(city='sec') select id,name,sal from stat_parti where city='sec';

2) 动态分区

这里我们需要启用

set hive.exec.dynamic.partition=true  
set hive.exec.dynamic.partition.mode=nonstrict

insert overwrite table stat_test partition(city) select id,name,sal from stat_parti;

【讨论】：

【解决方案3】：

你复制的数据文件test.txt在HDFS路径-'/testing/temp/city=hyd/test.txt' 所有数据都将进入分区-'city=hyd'

并且 Hive 使用目录名称来检索值。所以字段城市名称来自目录名称 hyd。

【讨论】：