1、使用Sqoop导入MySQL数据到HDFS
[root@srv01 ~]# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns 'uid,uname' -m 1 -target-dir '/sqoop/user'; //-m 指定map进程数,-target-dir指定存放目录
2、使用Sqoop导入MySQL数据到Hive中
[root@srv01 ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns 'uid,uname' -m 1
3、使用Sqoop导入MySQL数据到Hive中,并且指定表名
[root@srv01 ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns 'uid,uname' -m 1 --hive-table user1; //如果hive中没有这张表,则创建这张表保存对应数据
4、使用Sqoop导入MySQL数据到Hive中,并使用where条件
[root@srv01 ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns 'uid,uname' -m 1 --hive-table user2 where uid=10;
5、使用Sqoop导入MySQL数据到Hive中,并使用查询语句
[root@srv01 ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root -m 1 --hive-table user6 --query 'select * from user where uid<10 and $conditions' --target-dir /sqoop/user5; //and $conditions 必须加在查询语句中,不加报错
6、使用Sqoop将Hive中的数据导出到MySQL中
[root@srv01 ~]# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root -m 1 --table user5 --export-dir /sqoop/user5 //两张表的列的个数和类型必须相同
sqoop2
sqoop-shell
启动sqoop-shell
1 jjzhu:bin didi$ sqoop2-shell 2 Setting conf dir: /opt/sqoop-1.99.7/bin/../conf 3 Sqoop home directory: /opt/sqoop-1.99.7 4 Sqoop Shell: Type 'help' or '\h' for help. 5 6 sqoop:000> set server --host localhost --port 12000 --webapp sqoop 7 Server is set successfully 8 sqoop:000> show version --all 9 client version: 10 Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 11 Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016 12 0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13 server version: 14 Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 15 Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016 16 API versions: 17 [v1] 18 sqoop:000>
配置sqoop server
sqoop:000> set server --host localhost --port 12000 --webapp sqoop Server is set successfully
查看server连接是否可用
sqoop:000> show version --all client version: Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016 0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable server version: Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016 API versions: [v1] sqoop:000>
创建链接
查看sqoop server上可用的链接
1 sqoop:000> show connector 2 +------------------------+---------+------------------------------------------------------------+----------------------+ 3 | Name | Version | Class | Supported Directions | 4 +------------------------+---------+------------------------------------------------------------+----------------------+ 5 | generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO | 6 | kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO | 7 | oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO | 8 | ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO | 9 | hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO | 10 | kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO | 11 | sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO | 12 +------------------------+---------+------------------------------------------------------------+----------------------+ 13 sqoop:000>
- generic-jdbc-connector
依赖于java JDBC的connector,可以作为数据导入的数据源和目标源 - hdfs-connector
以hdfs作为数据源或者目标源的connector
用如下命令创建一个generic-jdbc-connector的链接
1 sqoop:002> create link -c generic-jdbc-connector 2 Creating link for connector with name generic-jdbc-connector 3 Please fill following values to create new link object 4 Name: mysql_weibouser_link 5 6 Database connection 7 8 Driver class: com.mysql.jdbc.Driver 9 Connection String: jdbc:mysql://127.0.0.1:3306/spider 10 Username: root 11 Password: **** 12 Fetch Size: 13 Connection Properties: 14 There are currently 0 values in the map: 15 entry# protocol=tcp 16 There are currently 1 values in the map: 17 protocol = tcp 18 entry# 19 20 SQL Dialect 21 22 Identifier enclose: **注意 这里不能直接回车!要打一个空格符号!因为如果不打,查询mysql表的时候会在表上加上“”,导致查询出错! 23 ** 24 New link was successfully created with validation status OK and name mysql_weibouser_link
创建hdfs link
1 sqoop:002> create link -c hdfs-connector 2 Creating link for connector with name hdfs-connector 3 Please fill following values to create new link object 4 Name: hdfs_weibouser_link 5 6 HDFS cluster 7 8 URI: hdfs://localhost:9000 9 Conf directory: /opt/hadoop-2.7.3/etc/hadoop 10 Additional configs:: 11 There are currently 0 values in the map: 12 entry# 13 New link was successfully created with validation status OK and name hdfs_weibouser_link
查看link
1 sqoop:002> show link 2 +----------------------+------------------------+---------+ 3 | Name | Connector Name | Enabled | 4 +----------------------+------------------------+---------+ 5 | mysql_weibouser | generic-jdbc-connector | true | 6 | mysql_weibouser_link | generic-jdbc-connector | true | 7 | hdfs_link | hdfs-connector | true | 8 | hdfs_link2 | hdfs-connector | true | 9 | hdfs_weibouser_link | hdfs-connector | true | 10 +----------------------+------------------------+---------+
创建job
1 sqoop:002> create job -f "mysql_weibouser_link" -t "hdfs_weibouser_link" 2 Creating job for links with from name mysql_weibouser_link and to name hdfs_weibouser_link 3 Please fill following values to create new job object 4 Name: job_weibouser 5 6 Database source 7 8 Schema name: spider 9 Table name: spiders_weibouser 10 SQL statement: 11 Column names: 12 There are currently 0 values in the list: 13 element# 14 Partition column: 15 Partition column nullable: 16 Boundary query: 17 18 Incremental read 19 20 Check column: 21 Last value: 22 23 Target configuration 24 25 Override null value: 26 Null value: 27 File format: 28 0 : TEXT_FILE 29 1 : SEQUENCE_FILE 30 2 : PARQUET_FILE 31 Choose: 0 32 Compression codec: 33 0 : NONE 34 1 : DEFAULT 35 2 : DEFLATE 36 3 : GZIP 37 4 : BZIP2 38 5 : LZO 39 6 : LZ4 40 7 : SNAPPY 41 8 : CUSTOM 42 Choose: 0 43 Custom codec: 44 Output directory: hdfs://localhost:9000/usr/jjzhu/spider/spiders_weibouser 45 Append mode: 46 47 Throttling resources 48 49 Extractors: 2 50 Loaders: 2 51 52 Classpath configuration 53 54 Extra mapper jars: 55 There are currently 0 values in the list: 56 element# 57 New job was successfully created with validation status OK and name job_weibouser