【问题标题】:AWS Data Pipeline MySQL Nulls sed shell command activity MIGRAINEAWS Data Pipeline MySQL Nulls sed shell 命令活动 MIGRAINE
【发布时间】:2016-07-19 18:41:01
【问题描述】:

有以下场景:

每天需要将 SQL 表传输到 MySQL 数据库。 我尝试使用 CopyActivity 使用 Data Pipeline,但导出的 CSV 有空格而不是 \N 或 NULL,因此 MySQL 将这些字段导入为“”,这对我们的应用程序不利。

然后我尝试了一种稍微不同的方法。 通过CopyActivity将表导出到S3,ShellCommandActivity下载文件,执行下面的脚本,上传文件到s3:

#!/bin/bash
sed -i -e 's/^,/\\N,/' -e 's/,$/,\\N/' -e 's/,,/,\\N,/g' -e 's/,,/,\\N,/g' ${INPUT1_STAGING_DIR}/*.csv |cat ${INPUT1_STAGING_DIR}/*.csv > ${OUTPUT1_STAGING_DIR}/sqltable.csv

上面的脚本在我的测试 linux 实例上完美运行,但在临时 EC2 资源上执行时没有任何反应。我没有任何错误,只是输出 s3 数据节点上带有空格的相同无用 csv。

我不知道我做错了什么以及为什么脚本与我的测试 linux 实例上的工作方式不同。

管道日志:

18 Jul 2016 10:23:06,470 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.taskrunner.TaskPoller: Executing: amazonaws.datapipeline.activity.ShellCommandActivity@515aa023
18 Jul 2016 10:23:06,648 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Downloading files from S3 Path:s3://s3-bucket/mysqlexport/sqltable.csv to output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd
18 Jul 2016 10:23:06,648 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Local File Relative compared to Input Root Path:s3://s3-bucket/mysqlexport/sqltable.csv is 
18 Jul 2016 10:23:06,648 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Download just the root file to the local dir. Updated File Relative compared to Input Root Path:s3://s3-bucket/mysqlexport/sqltable.csv is sqltable.csv
18 Jul 2016 10:23:06,649 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Downloading S3 file s3://s3-bucket/mysqlexport/sqltable.csv to /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd/sqltable.csv
18 Jul 2016 10:23:06,824 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed Downloading files from S3 Path:s3://s3-bucket/mysqlexport/sqltable.csv to output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd
18 Jul 2016 10:23:06,862 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.objects.CommandRunner: Executing command: #!/bin/bash
sed -i -e 's/^,/\\N,/' -e 's/,$/,\\N/' -e 's/,,/,\\N,/g' -e 's/,,/,\\N,/g' ${INPUT1_STAGING_DIR}/sqltable.csv |cat ${INPUT1_STAGING_DIR}/sqltable.csv > ${OUTPUT1_STAGING_DIR}/sqltable.csv
18 Jul 2016 10:23:06,865 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.objects.CommandRunner: configure ApplicationRunner with stdErr file: output/logs/df-09799242T7UHHPMT072T/ShellCommandActivityId_18OqM/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38_Attempt=1/StdError  and stdout file :output/logs/df-09799242T7UHHPMT072T/ShellCommandActivityId_18OqM/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38_Attempt=1/StdOutput
18 Jul 2016 10:23:06,866 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.objects.CommandRunner: Executing command: output/tmp/df-09799242T7UHHPMT072T-de05e7a112c440b4a42df69d554d8a9a/ShellCommandActivityId18OqM20160718T101838Attempt1_command.sh with env variables :{INPUT1_STAGING_DIR=/media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd, OUTPUT1_STAGING_DIR=/media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e} with argument : null
18 Jul 2016 10:23:06,952 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Uploading local directory:output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e to S3 s3://s3-bucket/mysqlexport/
18 Jul 2016 10:23:06,977 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Upload single file to S3:s3://s3-bucket/mysqlexport/sqltable.csv
18 Jul 2016 10:23:06,978 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin upload of file /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e/sqltable.csv to  S3 paths3://s3-bucket/mysqlexport/sqltable.csv
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed upload of file /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e/sqltable.csv to  S3 paths3://s3-bucket/mysqlexport/sqltable.csv
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed uploading of all files
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed upload of local dir output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e to s3://s3-bucket/mysqlexport/
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.StageFromS3Connector: cleaning up directory /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd
18 Jul 2016 10:23:07,050 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.StageInS3Connector: cleaning up directory /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e
18 Jul 2016 10:23:07,051 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.taskrunner.HeartBeatService: Finished waiting for heartbeat thread @DefaultShellCommandActivity1_2016-07-18T10:18:38_Attempt=1
18 Jul 2016 10:23:07,052 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.taskrunner.TaskPoller: Work ShellCommandActivity took 0:0 to complete

【问题讨论】:

  • 尝试以简单的步骤来做,从简单的开始,然后逐步建立,看看问题出在哪里。
  • 似乎 sed 在暂存输入文件夹中没有看到下载的 csv 文件。我创建了一个简单的 shell 命令活动管道来下载导出的 csv 并针对它执行脚本。它退出时出现错误 sed: can't read /media/ephemeral0/mnt/taskRunner/output/staging/df-04031231KGO5SUFX4TZL_input1_8bd192f49d6d472f8cb47291eda9c970/sqltable.csv : No such file or directory 。然后我写了一个脚本来输出 ls -l ${INPUT1_STAGING_DIR} > ${OUTPUT1_STAGING_DIR}/test.txt 它显示总计 168 -rw-rw-r-- 1 ec2-user ec2-user 168448 Jul 18 13:42 sqltable .csv 我一无所知..

标签: mysql sql amazon-web-services amazon-s3 amazon-data-pipeline


【解决方案1】:

不确定到底是什么问题,bash 脚本或 shell 命令不喜欢管道。无论如何,首先感谢 TenG 提醒我要分别分析流程中的每个步骤,而不是从整体上看。

我基本上没有下载脚本,而是在命令字段中输入了 sed 和复制命令。

sed -i -e 's/^,/\\\\N,/' -e 's/,,/,\\\\N,/g' -e 's/,$/,\\\\N/' ${INPUT1_STAGING_DIR}/sqltable.csv  
cp ${INPUT1_STAGING_DIR}/sqltable.csv ${OUTPUT1_STAGING_DIR}/sqltable.csv

无论如何,我仍然对 AWS Pipeline 感到失望,因为当它甚至无法识别标准 MySQL 空值(如 csv 文件中的 \N 和 NULL)时,他们声称支持 MySQL。如果我可以将 AWS Pipeline 用于导出和导入,那将非常简单和高效,但不幸的是,如果您的应用程序在空字段和 NULL 值之间存在差异,那么它不适合后者。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2015-09-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-03-04
    相关资源
    最近更新 更多