Sqoop is a tool designed for efficiently transferring data between RDBMS and HDFS, we can import data from mysql, oracle, and other data bases into HDFS very easily; meanwhile we can dump data into data base from HDFS. For detailed documentation, please refer to sqoop documentation.

Before using Sqoop, please follow steps to setup it correctly.

Sqoop - Import

the following command is used for import

sqoop import (generic-args) (import-args)

 

given a table named stock_info, and the schema is:

how to use Sqoop to import/ export data

Case 1: we can use below command to import stock_info data to hadoop hdfs file system:

sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1

and the result looks like:

how to use Sqoop to import/ export data

we can verify result in hdfs by running command

hadoop fs -cat /emp/part-m-*

 

Case 2: sepcify the target directory in hdfs by running the following import command

sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1 --target-dir /temp

then we can verify result by executing the same command as above

 

Case 3: imcremental import by specifying --incremental, --check-column and --append arguments. Note we should change 'last_chg_date' when applying other tables.

sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1 --target-dir /temp --incremental lastmodified --check-column last_chg_date --append

 

Case 4: specify target file format as parquet format by adding argument '--as-parquetfile'

sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1 --target-dir /temp --incremental lastmodified --check-column last_chg_date --append --as-parquetfile

 

Case 5: import all tables

sqoop import-all-tables --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser 

 

Sqoop - Export

export means to dump data from hdfs to mysql, oracle or other data bases, command syntax is like

sqoop export (generic-args) (export-args)

given there are many parquet files under stock_info folder which is imported by sqoop import command incrementally

how to use Sqoop to import/ export data

then we want to dump data back into mysql data base, using the following command

sqoop export --connent jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --export-dir /user/hlli/stock_info

finally verify data in mysql command line

select * from stock_info;

 

Incremental importing data

by using linux timer 'crontab' to schedule a job to execute importing periodically.

cd /var/spool/cron

touch hlli (please change hlli to your user name here)

vi hlli

*/5 * * * * /usr/lib/sqoop/bin/sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1 --target-dir /temp --incremental lastmodified --check-column last_chg_date --append --as-parquetfile

if it works, you will receive email in '/var/spool/mail/hlli'; meanwhile we can verify data by running command

hadoop fs -ls /

 

Commonly used Sqoop commands

sqoop help import

sqoop help export

sqoop help job

sqoop help codegen

sqoop help eval

sqoop help list-tables

sqoop help list-databases

sqoop help import-all-tables

 

References:

  1. http://sqoop.apache.org/
  2. http://man.linuxde.net/crontab

 

转载于:https://www.cnblogs.com/allanli/p/how_to_use_sqoop.html

相关文章:

  • 2021-11-06
  • 2021-09-16
  • 2021-07-31
  • 2021-10-31
  • 2021-04-05
  • 2021-04-30
  • 2021-04-15
  • 2021-08-06
猜你喜欢
  • 2021-05-28
  • 2022-01-23
  • 2021-07-10
  • 2021-07-22
  • 2021-07-20
  • 2021-06-27
  • 2021-09-12
相关资源
相似解决方案