【发布时间】:2016-01-12 01:06:21
【问题描述】:
我正在尝试使用 YARN REST API 提交 spark-submit 作业,我通常通过命令行运行。
我的命令行 spark-submit 看起来像这样
JAVA_HOME=/usr/local/java7/ HADOOP_CONF_DIR=/etc/hadoop/conf /usr/local/spark-1.5/bin/spark-submit \
--driver-class-path "/etc/hadoop/conf" \
--class MySparkJob \
--master yarn-cluster \
--conf "spark.executor.extraClassPath=/usr/local/hadoop/client/hadoop-*" \
--conf "spark.driver.extraClassPath=/usr/local/hadoop/client/hadoop-*" \
spark-job.jar --retry false --counter 10
阅读 YARN REST API 文档 https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application,我尝试创建 JSON 有效负载以进行 POST,如下所示
{
"am-container-spec": {
"commands": {
"command": "JAVA_HOME=/usr/local/java7/ HADOOP_CONF_DIR=/etc/hadoop/conf org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --jar spark-job.jar --class MySparkJob --arg --retry --arg false --arg --counter --arg 10"
},
"local-resources": {
"entry": [
{
"key": "spark-job.jar",
"value": {
"resource": "hdfs:///spark-job.jar",
"size": 3214567,
"timestamp": 1452408423000,
"type": "FILE",
"visibility": "APPLICATION"
}
}
]
}
},
"application-id": "application_11111111111111_0001",
"application-name": "test",
"application-type": "Spark"
}
我看到的问题是,hadoop configs 目录以前在我运行作业的机器上是本地的,现在我通过 REST API 提交作业并且它直接在 RM 上运行,我不知道如何提供这些细节 ?
【问题讨论】:
-
问题解决了吗?我也面临使用这个 Yarn API 的问题,但没有其他方法。
-
这个github项目很有帮助:github.com/bernhard-42/spark-yarn-rest-api
标签: hadoop apache-spark hadoop-yarn