如何在 shell 脚本中捕获作业状态以进行 spark-submit答案

【问题标题】：how to capture the job status in shell script for spark-submit如何在 shell 脚本中捕获作业状态以进行 spark-submit
【发布时间】：2020-06-05 04:46:42
【问题描述】：

我正在使用带有 spark-sql-2.4.1v 的 bashshell。我在 shell 脚本中使用 spark-submit 提交我的 spark 作业。

Need to capture the status of my job. how can this be achieved ?

请提供任何帮助/建议？

【问题讨论】：

标签： apache-spark apache-spark-sql sh airflow

【解决方案1】：

检查下面的代码。

process_start_datetime=$(date +%Y%m%d%H%M%S)
log_path="<log_dir>"
log_file="${log_path}/${app_name}_${process_start_datetime}.log"

spark-submit \
    --verbose \
    --deploy-mode cluster \
    --executor-cores "$executor_cores" \
    --num-executors "$num_executors" \
    --driver-memory "$driver_memory" \
    --executor-memory "$executor_memory"  \
    --master yarn \
    --class main.App "$appJar" 2>&1 | tee -a "$log_file"

status=$(grep "final status:" < "$log_file" | cut -d ":" -f2 | tail -1 | awk '$1=$1')

获取应用程序 ID

applicationId=$(grep "tracking URL" < "$log_file" | head -n 1 | cut -d "/" -f5)

【讨论】：

先生这是什么 2>&1 | tee -a "$log_file" ??
那么这里的 2 是什么？
是的，你已经在执行前设置了目录。对于上述问题，请检查 - unix.stackexchange.com/questions/37660/order-of-redirections
cut 将分割字段，-d 将是分隔符 & -f 将是倒序字段
修剪额外的空格，比如 - echo " Running " | awk '$1=$1'

【解决方案2】：

spark-submit 是一个异步作业，所以当我们提交命令时，您可以通过调用 SparkContext.applicationId 来获取应用程序 ID。然后您可以检查状态。

参考-https://issues.apache.org/jira/browse/SPARK-5439

如果 Spark 部署在 Yarn 上，那么您可以使用 -

检查状态

///To get application ID use yarn application -list
yarn application -status application_1459542433815_0002

他们在answer中提到了另一种方式

【讨论】：

如何在 shell 脚本中获取这个 applicationId ?..which 调用 spark-submit