【问题标题】:sqoop incremental import append mode with timestamp column带有时间戳列的 sqoop 增量导入追加模式
【发布时间】:2018-05-04 13:48:29
【问题描述】:

有人可以帮助我了解 sqoop 增量中 append 和 lastmodified 模式之间的确切区别吗?

当使用附加模式(使用 --check-column 作为时间戳)可以完成相同的事情时,lastmodified 的需要是什么。它的工作原理相同,并且也导入更新和插入的记录。

【问题讨论】:

    标签: increment sqoop


    【解决方案1】:
    1. 模式:追加可用于您知道最后一个值的列。
    2. 模式:lastmodified 模式可用于时间戳列。可能很难记住上次修改的时间戳。如果您知道上次修改的时间戳,则可以采用第一种方法。

    模式:追加

    mysql> describe emp;
    +--------+--------------+------+-----+---------+-------+
    | Field  | Type         | Null | Key | Default | Extra |
    +--------+--------------+------+-----+---------+-------+
    | id     | int(11)      | YES  |     | NULL    |       |
    | name   | varchar(100) | YES  |     | NULL    |       |
    | deg    | varchar(100) | YES  |     | NULL    |       |
    | salary | int(11)      | YES  |     | NULL    |       |
    | dep    | varchar(10)  | YES  |     | NULL    |       |
    +--------+--------------+------+-----+---------+-------+
    5 rows in set (0.00 sec)
    

    使用 sqoop 命令导入数据。

    sqoop import \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username retail_dba \
    --password cloudera \
    -m 3 \
    --table emp \
    --split-by id \
    --columns id,name,deg \
    --warehouse-dir /user/sqoop/ \
    --delete-target-dir \
    --as-textfile
    

    HDFS 输出

    [cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/
    drwxr-xr-x   - cloudera supergroup          0 2017-12-02 13:14 /user/sqoop/emp
    -rw-r--r--   1 cloudera supergroup          0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
    -rw-r--r--   1 cloudera supergroup         70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
    -rw-r--r--   1 cloudera supergroup         64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
    -rw-r--r--   1 cloudera supergroup         86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
    
    
    mysql> select * from emp;
    +------+---------+--------------+--------+------+
    | id   | name    | deg          | salary | dep  |
    +------+---------+--------------+--------+------+
    | 1201 | gopal   | manager      |  50000 | tp   |
    | 1202 | manisha | Proof reader |  50000 | TP   |
    | 1203 | php dev | TECH WRITER  |  50000 | AC   |
    | 1204 | Nilesh  | Domino dev   |  70000 | AF   |
    | 1205 | Vinayak | Java dev     |  50000 | IT   |
    | 1206 | Amish   | Cog dev      |  60000 | IT   |
    | 1207 | Jatin   | Oracel dev   |  40001 | IT   |
    | 1208 | Viren   | Java dev     |  70004 | IT   |
    | 1209 | Ashish  | Oracel dev   |  40001 | IT   |
    | 1210 | Satish  | Java dev     |  70004 | IT   |
    +------+---------+--------------+--------+------+
    10 rows in set (0.00 sec)
    

    将新记录插入表中。

    mysql> insert into emp values(1211,'Jag', 'be', 20000, 'IT');
    Query OK, 1 row affected (0.03 sec)
    
    mysql> select * from emp;
    +------+---------+--------------+--------+------+
    | id   | name    | deg          | salary | dep  |
    +------+---------+--------------+--------+------+
    | 1201 | gopal   | manager      |  50000 | tp   |
    | 1202 | manisha | Proof reader |  50000 | TP   |
    | 1203 | php dev | TECH WRITER  |  50000 | AC   |
    | 1204 | Nilesh  | Domino dev   |  70000 | AF   |
    | 1205 | Vinayak | Java dev     |  50000 | IT   |
    | 1206 | Amish   | Cog dev      |  60000 | IT   |
    | 1207 | Jatin   | Oracel dev   |  40001 | IT   |
    | 1208 | Viren   | Java dev     |  70004 | IT   |
    | 1209 | Ashish  | Oracel dev   |  40001 | IT   |
    | 1210 | Satish  | Java dev     |  70004 | IT   |
    | 1211 | Jag     | be           |  20000 | IT   |
    +------+---------+--------------+--------+------+
    11 rows in set (0.00 sec)
    

    增量导入命令

    sqoop import \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username retail_dba \
    --password cloudera \
    --table emp \
    --split-by id \
    --check-column id \
    --incremental append \
    --last-value 1210 \
    --warehouse-dir /user/sqoop/ \
    --as-textfile
    

    导入后

    [cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/emp
    -rw-r--r--   1 cloudera supergroup          0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
    -rw-r--r--   1 cloudera supergroup         70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
    -rw-r--r--   1 cloudera supergroup         64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
    -rw-r--r--   1 cloudera supergroup         86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
    -rw-r--r--   1 cloudera cloudera           21 2017-12-02 13:48 /user/sqoop/emp/part-m-00003
    [cloudera@quickstart lib]$ hadoop fs -cat /user/sqoop/emp/part-m-00003
    1211,Jag,be,20000,IT
    

    模式:上次修改

    mysql> describe orders;
    
    +-------------------+-------------+------+-----+---------+----------------+
    
    | Field             | Type        | Null | Key | Default | Extra          |
    
    +-------------------+-------------+------+-----+---------+----------------+
    
    | order_id          | int(11)     | NO   | PRI | NULL    | auto_increment |
    
    | order_date        | datetime    | NO   |     | NULL    |                |
    
    | order_customer_id | int(11)     | NO   |     | NULL    |                |
    
    | order_status      | varchar(45) | NO   |     | NULL    |                |
    
    +-------------------+-------------+------+-----+---------+----------------+
    
    4 rows in set (0.00 sec)    
    

    将订单导入hdfs

    sqoop import \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username retail_dba \
    --password cloudera \
    --table orders \
    --split-by order_id  \
    --target-dir /user/sqoop/orders \
    --as-textfile
    

    导入后

    [cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders
    -rw-r--r--   1 cloudera supergroup          0 2017-12-02 16:01 /user/sqoop/orders/_SUCCESS
    -rw-r--r--   1 cloudera supergroup     741597 2017-12-02 16:01 /user/sqoop/orders/part-m-00000
    -rw-r--r--   1 cloudera supergroup     753022 2017-12-02 16:01 /user/sqoop/orders/part-m-00001
    -rw-r--r--   1 cloudera supergroup     752368 2017-12-02 16:01 /user/sqoop/orders/part-m-00002
    -rw-r--r--   1 cloudera supergroup     752940 2017-12-02 16:01 /user/sqoop/orders/part-m-00003
    

    更新订单数据

    mysql> select * from orders where order_id=10;
    +----------+---------------------+-------------------+-----------------+
    | order_id | order_date          | order_customer_id | order_status    |
    +----------+---------------------+-------------------+-----------------+
    |       10 | 2013-07-25 00:00:00 |              5648 | PENDING_PAYMENT |
    +----------+---------------------+-------------------+-----------------+
    1 row in set (0.00 sec)
    
    mysql> update orders set order_status='CLOSED', order_date=now() where order_id=10;
    Query OK, 1 row affected (0.01 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    
    mysql> select * from orders where order_id=10;
    +----------+---------------------+-------------------+--------------+
    | order_id | order_date          | order_customer_id | order_status |
    +----------+---------------------+-------------------+--------------+
    |       10 | 2017-12-02 16:19:23 |              5648 | CLOSED       |
    +----------+---------------------+-------------------+--------------+
    1 row in set (0.00 sec)
    

    导入其他数据

    sqoop import \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username retail_dba \
    --password cloudera \
    --table orders \
    --split-by order_id  \
    --check-column order_date \
    --merge-key order_id \
    --incremental lastmodified \
    --target-dir /user/sqoop/orders1 \
    --as-textfile
    

    输出

    [cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders1
    -rw-r--r--   1 cloudera cloudera          0 2017-12-02 16:07 /user/sqoop/orders1/_SUCCESS
    -rw-r--r--   1 cloudera cloudera    2999918 2017-12-02 16:07 /user/sqoop/orders1/part-r-00000
    

    注意:如果我们使用与之前相同的目录(订单),则会删除旧文件并创建新部分。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-03-27
      • 1970-01-01
      • 1970-01-01
      • 2015-08-31
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多