Kettle将数据导入导Hive2

前言

本来将数据导入到hive，但是不知道什么原因太慢了，一小时200条数据，优化一波kettle，还是成效不大，因此改为借用hadoop file output 到hdfs，然后再load 到hive表里

1.在转换里拖入big data/hadoop file output

新建hadoop cluster连接

从集群里下载core-site.xml，hdfs-site.xml，yarn-site.xml，mapred-site.xml

覆盖kettle的plugins\pentaho-big-data-plugin\hadoop-configurations\hdp26中的4个同名文件。 Kettle将数据导入导Hive2

2.填写配置

Kettle将数据导入导Hive2

连接信息只要hadoop file system connection连对就行

Kettle将数据导入导Hive2

再从脚本里托人SQL

Kettle将数据导入导Hive2

在文件里输入路径/文件名

Kettle将数据导入导Hive2 内容里选择分隔符，是不是显示字段名(头部)，压缩格式(orc,snappy)

Kettle将数据导入导Hive2 在生气了语句里，新建数据库连接，写入sql语句

Kettle将数据导入导Hive2