【问题标题】:EMR step did not appear to run/create jobsEMR 步骤似乎没有运行/创建作业
【发布时间】:2019-10-10 15:48:43
【问题描述】:

我正在尝试让 EMR 运行一个简单的 hello world 类型的应用程序。

from pyspark import SparkContext
from operator import add

sc = SparkContext()
data = sc.parallelize(list("Hello World"))
data.show()
counts = data.map(lambda x: (x, 1)).reduceByKey(add).sortBy(lambda x: x[1], ascending=False).coalesce(1).saveAsTextFile('s3://pinfare-glue/emr/output.txt')
print("Hello EMR")
sc.stop()

但我在日志中没有看到任何问候。 S3中也没有任何文件

我之前已经将脚本上传到 S3 并运行了这个命令

aws emr add-steps --cluster-id j-XXXXXX --steps "Type=spark,Name=Test,Args=[--deploy-mode,cluster,--master,yarn,--conf,spark.yarn.submit.waitAppCompletion=false,--num-executors,2,--executor-cores,2,--executor-memory,8g,s3://XXXXXX/emr-test.py,s3://XXXXXX/emr-in,s3://XXXXXX/emr-out],ActionOnFailure=CONTINUE"

我的日志看起来像:

2019-05-24T04:24:12.718Z INFO Ensure step 4 jar file command-runner.jar
2019-05-24T04:24:12.718Z INFO StepRunner: Created Runner for step 4
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster --master yarn --conf spark.yarn.submit.waitAppCompletion=false --num-executors 2 --executor-cores 2 --executor-memory 8g s3://XXXXXX/emr-test.py s3://XXXXXX/emr-in s3://XXXXXX/emr-out'
INFO Environment:
  PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/aws/bin
  LESS_TERMCAP_md=[01;38;5;208m
  LESS_TERMCAP_me=[0m
  HISTCONTROL=ignoredups
  LESS_TERMCAP_mb=[01;31m
  AWS_AUTO_SCALING_HOME=/opt/aws/apitools/as
  UPSTART_JOB=rc
  LESS_TERMCAP_se=[0m
  HISTSIZE=1000
  HADOOP_ROOT_LOGGER=INFO,DRFA
  JAVA_HOME=/etc/alternatives/jre
  AWS_DEFAULT_REGION=ap-southeast-1
  AWS_ELB_HOME=/opt/aws/apitools/elb
  LESS_TERMCAP_us=[04;38;5;111m
  EC2_HOME=/opt/aws/apitools/ec2
  TERM=linux
  runlevel=3
  LANG=en_US.UTF-8
  AWS_CLOUDWATCH_HOME=/opt/aws/apitools/mon
  MAIL=/var/spool/mail/hadoop
  LESS_TERMCAP_ue=[0m
  LOGNAME=hadoop
  PWD=/
  LANGSH_SOURCED=1
  HADOOP_CLIENT_OPTS=-Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/s-31Z3K7X3KH3N6/tmp
  _=/etc/alternatives/jre/bin/java
  CONSOLETYPE=serial
  RUNLEVEL=3
  LESSOPEN=||/usr/bin/lesspipe.sh %s
  previous=N
  UPSTART_EVENTS=runlevel
  AWS_PATH=/opt/aws
  USER=hadoop
  UPSTART_INSTANCE=
  PREVLEVEL=N
  HADOOP_LOGFILE=syslog
  PYTHON_INSTALL_LAYOUT=amzn
  HOSTNAME=ip-172-31-128-120
  HADOOP_LOG_DIR=/mnt/var/log/hadoop/steps/s-31Z3K7X3KH3N6
  EC2_AMITOOL_HOME=/opt/aws/amitools/ec2
  SHLVL=5
  HOME=/home/hadoop
  HADOOP_IDENT_STRING=hadoop
INFO redirectOutput to /mnt/var/log/hadoop/steps/s-31Z3K7X3KH3N6/stdout
INFO redirectError to /mnt/var/log/hadoop/steps/s-31Z3K7X3KH3N6/stderr
INFO Working dir /mnt/var/lib/hadoop/steps/s-31Z3K7X3KH3N6
INFO ProcessRunner started child process 16829 :
hadoop   16829  4392  0 04:24 ?        00:00:00 bash /usr/lib/hadoop/bin/hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster --master yarn --conf spark.yarn.submit.waitAppCompletion=false --num-executors 2 --executor-cores 2 --executor-memory 8g s3://XXXXXX/emr-test.py s3://XXXXXX/emr-in s3://XXXXXX/emr-out
2019-05-24T04:24:16.724Z INFO HadoopJarStepRunner.Runner: startRun() called for s-31Z3K7X3KH3N6 Child Pid: 16829
INFO Synchronously wait child process to complete : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO waitProcessCompletion ended with exit code 0 : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO total process run time: 14 seconds
2019-05-24T04:24:28.813Z INFO Step created jobs: 
2019-05-24T04:24:28.813Z INFO Step succeeded with exitCode 0 and took 14 seconds

【问题讨论】:

  • 在master节点上通过spark-submit运行python文件会看到什么?

标签: amazon-web-services apache-spark pyspark amazon-emr


【解决方案1】:

您发布了 controller 日志,但打印或显示将转到 stdoutstderr,因此请检查其他日志。

INFO redirectOutput to /mnt/var/log/hadoop/steps/s-31Z3K7X3KH3N6/stdout
INFO redirectError to /mnt/var/log/hadoop/steps/s-31Z3K7X3KH3N6/stderr

我可以在您的日志中看到这两行。转到 EMR 控制台,选项卡步骤,找到带有 s-31Z3K7X3KH3N6 的步骤 ID 并查看日志。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-09-19
    • 1970-01-01
    • 2017-08-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多