【发布时间】:2016-07-19 18:41:01
【问题描述】:
有以下场景:
每天需要将 SQL 表传输到 MySQL 数据库。 我尝试使用 CopyActivity 使用 Data Pipeline,但导出的 CSV 有空格而不是 \N 或 NULL,因此 MySQL 将这些字段导入为“”,这对我们的应用程序不利。
然后我尝试了一种稍微不同的方法。 通过CopyActivity将表导出到S3,ShellCommandActivity下载文件,执行下面的脚本,上传文件到s3:
#!/bin/bash
sed -i -e 's/^,/\\N,/' -e 's/,$/,\\N/' -e 's/,,/,\\N,/g' -e 's/,,/,\\N,/g' ${INPUT1_STAGING_DIR}/*.csv |cat ${INPUT1_STAGING_DIR}/*.csv > ${OUTPUT1_STAGING_DIR}/sqltable.csv
上面的脚本在我的测试 linux 实例上完美运行,但在临时 EC2 资源上执行时没有任何反应。我没有任何错误,只是输出 s3 数据节点上带有空格的相同无用 csv。
我不知道我做错了什么以及为什么脚本与我的测试 linux 实例上的工作方式不同。
管道日志:
18 Jul 2016 10:23:06,470 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.taskrunner.TaskPoller: Executing: amazonaws.datapipeline.activity.ShellCommandActivity@515aa023
18 Jul 2016 10:23:06,648 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Downloading files from S3 Path:s3://s3-bucket/mysqlexport/sqltable.csv to output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd
18 Jul 2016 10:23:06,648 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Local File Relative compared to Input Root Path:s3://s3-bucket/mysqlexport/sqltable.csv is
18 Jul 2016 10:23:06,648 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Download just the root file to the local dir. Updated File Relative compared to Input Root Path:s3://s3-bucket/mysqlexport/sqltable.csv is sqltable.csv
18 Jul 2016 10:23:06,649 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Downloading S3 file s3://s3-bucket/mysqlexport/sqltable.csv to /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd/sqltable.csv
18 Jul 2016 10:23:06,824 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed Downloading files from S3 Path:s3://s3-bucket/mysqlexport/sqltable.csv to output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd
18 Jul 2016 10:23:06,862 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.objects.CommandRunner: Executing command: #!/bin/bash
sed -i -e 's/^,/\\N,/' -e 's/,$/,\\N/' -e 's/,,/,\\N,/g' -e 's/,,/,\\N,/g' ${INPUT1_STAGING_DIR}/sqltable.csv |cat ${INPUT1_STAGING_DIR}/sqltable.csv > ${OUTPUT1_STAGING_DIR}/sqltable.csv
18 Jul 2016 10:23:06,865 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.objects.CommandRunner: configure ApplicationRunner with stdErr file: output/logs/df-09799242T7UHHPMT072T/ShellCommandActivityId_18OqM/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38_Attempt=1/StdError and stdout file :output/logs/df-09799242T7UHHPMT072T/ShellCommandActivityId_18OqM/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38_Attempt=1/StdOutput
18 Jul 2016 10:23:06,866 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.objects.CommandRunner: Executing command: output/tmp/df-09799242T7UHHPMT072T-de05e7a112c440b4a42df69d554d8a9a/ShellCommandActivityId18OqM20160718T101838Attempt1_command.sh with env variables :{INPUT1_STAGING_DIR=/media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd, OUTPUT1_STAGING_DIR=/media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e} with argument : null
18 Jul 2016 10:23:06,952 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Uploading local directory:output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e to S3 s3://s3-bucket/mysqlexport/
18 Jul 2016 10:23:06,977 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Upload single file to S3:s3://s3-bucket/mysqlexport/sqltable.csv
18 Jul 2016 10:23:06,978 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin upload of file /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e/sqltable.csv to S3 paths3://s3-bucket/mysqlexport/sqltable.csv
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed upload of file /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e/sqltable.csv to S3 paths3://s3-bucket/mysqlexport/sqltable.csv
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed uploading of all files
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed upload of local dir output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e to s3://s3-bucket/mysqlexport/
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.StageFromS3Connector: cleaning up directory /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd
18 Jul 2016 10:23:07,050 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.StageInS3Connector: cleaning up directory /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e
18 Jul 2016 10:23:07,051 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.taskrunner.HeartBeatService: Finished waiting for heartbeat thread @DefaultShellCommandActivity1_2016-07-18T10:18:38_Attempt=1
18 Jul 2016 10:23:07,052 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.taskrunner.TaskPoller: Work ShellCommandActivity took 0:0 to complete
【问题讨论】:
-
尝试以简单的步骤来做,从简单的开始,然后逐步建立,看看问题出在哪里。
-
似乎 sed 在暂存输入文件夹中没有看到下载的 csv 文件。我创建了一个简单的 shell 命令活动管道来下载导出的 csv 并针对它执行脚本。它退出时出现错误 sed: can't read /media/ephemeral0/mnt/taskRunner/output/staging/df-04031231KGO5SUFX4TZL_input1_8bd192f49d6d472f8cb47291eda9c970/sqltable.csv : No such file or directory 。然后我写了一个脚本来输出 ls -l ${INPUT1_STAGING_DIR} > ${OUTPUT1_STAGING_DIR}/test.txt 它显示总计 168 -rw-rw-r-- 1 ec2-user ec2-user 168448 Jul 18 13:42 sqltable .csv 我一无所知..
标签: mysql sql amazon-web-services amazon-s3 amazon-data-pipeline