【发布时间】:2019-08-14 06:40:19
【问题描述】:
使用数据管道 aws 进程备份 dynamoDB 表时出现错误:
02 May 2017 07:19:04,544 [WARN] (TaskRunnerService-df-0940986HJGYQM1ZJ8BN_@EmrClusterForBackup_2017-04-25T13:31:55-2) df-0940986HJGYQM1ZJ8BN amazonaws.datapipeline.cluster.EmrUtil: EMR job flow named 'df-0940986HJGYQM1ZJ8BN_@EmrClusterForBackup_2017-04-25T13:31:55' with jobFlowId 'j-2SJ0OQOM0BTI' is in status 'RUNNING' because of the step 'df-0940986HJGYQM1ZJ8BN_@TableBackupActivity_2017-04-25T13:31:55_Attempt=2' failures 'null'
02 May 2017 07:19:04,544 [INFO] (TaskRunnerService-df-0940986HJGYQM1ZJ8BN_@EmrClusterForBackup_2017-04-25T13:31:55-2) df-0940986HJGYQM1ZJ8BN amazonaws.datapipeline.cluster.EmrUtil: EMR job '@TableBackupActivity_2017-04-25T13:31:55_Attempt=2' with jobFlowId 'j-2SJ0OQOM0BTI' is in status 'RUNNING' and reason 'Running step'. Step 'df-0940986HJGYQM1ZJ8BN_@TableBackupActivity_2017-04-25T13:31:55_Attempt=2' is in status 'FAILED' with reason 'null'
02 May 2017 07:19:04,544 [INFO] (TaskRunnerService-df-0940986HJGYQM1ZJ8BN_@EmrClusterForBackup_2017-04-25T13:31:55-2) df-0940986HJGYQM1ZJ8BN amazonaws.datapipeline.cluster.EmrUtil: Collecting steps stderr logs for cluster with AMI 3.9.0
02 May 2017 07:19:04,558 [INFO] (TaskRunnerService-df-0940986HJGYQM1ZJ8BN_@EmrClusterForBackup_2017-04-25T13:31:55-2) df-0940986HJGYQM1ZJ8BN amazonaws.datapipeline.taskrunner.LogMessageUtil: Returning tail errorMsg : at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:460)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
at org.apache.hadoop.dynamodb.tools.DynamoDbExport.run(DynamoDbExport.java:79)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.dynamodb.tools.DynamoDbExport.main(DynamoDbExport.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
有大量数据(600 万)。管道工作了 4 天并出现错误。无法找出错误。
【问题讨论】:
-
你能提供你的管道定义吗?
-
另外,请尝试在 AWS 论坛上提问,可以访问您的管道的人可以提供更多信息。
-
你找到解决这个问题的方法了吗?
-
@Nir 请看下面我的回答...
标签: amazon-web-services amazon-data-pipeline