【问题标题】:Spark: Monitoring a cluster mode applicationSpark:监控集群模式应用程序
【发布时间】:2016-08-18 05:28:37
【问题描述】:

现在我正在使用 spark-submit 在集群模式下启动应用程序。来自主服务器的响应提供了一个带有 submitId 的 json 对象,我用它来识别应用程序并在必要时将其终止。但是,我还没有找到一种简单的方法来从主服务器响应或驱动程序 ID 中检索 worker rest url(可能网络可以抓取主 Web ui,但这会很丑)。相反,我必须等到应用程序完成,然后从历史服务器中查找应用程序统计信息。

有没有什么方法可以使用 driver-id 从以集群模式部署的应用程序(通常在 worker-node:4040)中识别 worker url?

16/08/12 11:39:47 INFO RestSubmissionClient: Submitting a request to launch an application in spark://192.yyy:6066.
16/08/12 11:39:47 INFO RestSubmissionClient: Submission successfully created as driver-20160812114003-0001. Polling submission state...
16/08/12 11:39:47 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20160812114003-0001 in spark://192.yyy:6066.
16/08/12 11:39:47 INFO RestSubmissionClient: State of driver driver-20160812114003-0001 is now RUNNING.
16/08/12 11:39:47 INFO RestSubmissionClient: Driver is running on worker worker-20160812113715-192.xxx-46215 at 192.xxx:46215.
16/08/12 11:39:47 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
    "action" : "CreateSubmissionResponse",
    "message" : "Driver successfully submitted as driver-20160812114003-0001",
    "serverSparkVersion" : "1.6.1",
    "submissionId" : "driver-20160812114003-0001",
    "success" : true
}

编辑:这是 DEBUG 处 log4j 控制台输出的典型输出外观

Spark 提交命令:

./apps/spark-2.0.0-bin-hadoop2.7/bin/spark-submit --master mesos://masterurl:7077 
    --verbose --class MainClass --deploy-mode cluster
    ~/path/myjar.jar args

Spark 提交输出:

Using properties file: null
Parsed arguments:
  master                  mesos://masterurl:7077
  deployMode              cluster
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            null
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               MyApp
  primaryResource         file:/path/myjar.jar
  name                    MyApp
  childArgs               [args]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file null:



Main class:
org.apache.spark.deploy.rest.RestSubmissionClient
Arguments:
file:/path/myjar.jar
MyApp
args
System properties:
SPARK_SUBMIT -> true
spark.driver.supervise -> false
spark.app.name -> MyApp
spark.jars -> file:/path/myjar.jar
spark.submit.deployMode -> cluster
spark.master -> mesos://masterurl:7077
Classpath elements:



16/08/17 13:26:49 INFO RestSubmissionClient: Submitting a request to launch an application in mesos://masterurl:7077.
16/08/17 13:26:49 DEBUG RestSubmissionClient: Sending POST request to server at http://masterurl:7077/v1/submissions/create:
{
  "action" : "CreateSubmissionRequest",
  "appArgs" : [ args ],
  "appResource" : "file:/path/myjar.jar",
  "clientSparkVersion" : "2.0.0",
  "environmentVariables" : {
    "SPARK_SCALA_VERSION" : "2.10"
  },
  "mainClass" : "SimpleSort",
  "sparkProperties" : {
    "spark.jars" : "file:/path/myjar.jar",
    "spark.driver.supervise" : "false",
    "spark.app.name" : "MyApp",
    "spark.submit.deployMode" : "cluster",
    "spark.master" : "mesos://masterurl:7077"
  }
}
16/08/17 13:26:49 DEBUG RestSubmissionClient: Response from the server:
{
  "action" : "CreateSubmissionResponse",
  "serverSparkVersion" : "2.0.0",
  "submissionId" : "driver-20160817132658-0004",
  "success" : true
}
16/08/17 13:26:49 INFO RestSubmissionClient: Submission successfully created as driver-20160817132658-0004. Polling submission state...
16/08/17 13:26:49 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20160817132658-0004 in mesos://masterurl:7077.
16/08/17 13:26:49 DEBUG RestSubmissionClient: Sending GET request to server at http://masterurl:7077/v1/submissions/status/driver-20160817132658-0004.
16/08/17 13:26:49 DEBUG RestSubmissionClient: Response from the server:
{
  "action" : "SubmissionStatusResponse",
  "driverState" : "RUNNING",
  "serverSparkVersion" : "2.0.0",
  "submissionId" : "driver-20160817132658-0004",
  "success" : true
}
16/08/17 13:26:49 INFO RestSubmissionClient: State of driver driver-20160817132658-0004 is now RUNNING.
16/08/17 13:26:49 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "serverSparkVersion" : "2.0.0",
  "submissionId" : "driver-20160817132658-0004",
  "success" : true
}

【问题讨论】:

    标签: apache-spark


    【解决方案1】:

    主服务器的响应没有提供application-id吗?

    我相信您所需要的只是解决此问题的应用程序的主 URL 和应用程序 ID。获得 application-id 后,在 master-URL 使用端口 4040 并将您的预期端点附加到它。

    例如,如果您的应用程序 id 是 application_1468141556944_1055

    获取所有作业的列表

    http://<master>:4040/api/v1/applications/application_1468141556944_1055/jobs
    

    获取存储的 RDD 列表

    http://<master>:4040/api/v1/applications/application_1468141556944_1055/storage/rdd
    

    但是,如果您没有 application-id,我可能会从以下开始:

    在启动 spark 作业时设置 verbose 模式 (--verbose) 以在控制台上获取应用程序 ID。然后,您可以在日志输出中解析 application-id。日志输出通常如下所示:

    16/08/12 08:50:53 INFO Client: Application report for application_1468141556944_3791 (state: RUNNING)
    

    因此,application-id 是 application_1468141556944_3791

    也可以通过日志输出中的tracking URL找到master-url和application-id,如下图

        client token: N/A
        diagnostics: N/A
        ApplicationMaster host: 10.50.0.33
        ApplicationMaster RPC port: 0
        queue: ns_debug
        start time: 1470992969127
        final status: UNDEFINED
        tracking URL: http://<master>:8088/proxy/application_1468141556944_3799/
    

    这些消息处于 INFO 日志级别,因此请确保您在 log4j.properties 文件中设置 log4j.rootCategory=INFO, console 以便您可以看到它们。

    【讨论】:

    • 试过 --verbose,但没有应用程序 ID。日志级别已设置为 INFO。 Spark 版本 1.6.1 顺便说一句
    • 我在我的应用程序 jar 上使用 spark-submit 启动 spark 作业时看到这些输出。我有相同的版本。您可以尝试 DEBUG 模式(而不是 INFO),也可以在代码中使用函数 SparkContext.applicationId 将 application-id 发送到本地文件或其他地方
    • 这太奇怪了,我将编辑上面的问题以给出我的输出
    • 你是在客户端还是集群模式下运行?
    • 是的,我在集群模式下运行。啊,我明白了,您使用 mesos 作为集群管理器,而我的是 yarn。我在任何地方都看不到纱线中的提交 ID,因此无法真正找到如何直接从提交 ID 获取应用程序 ID。我想我会在代码中尝试SparkContext.applicationId 来吐出前面提到的应用程序ID。
    【解决方案2】:

    必须从 spark master web ui 中获取一个接近的应用程序 ID(在同一分钟和相同的后缀内,例如 20161010025XXX-0005,X 作为通配符),然后在其后面的链接标签中查找工作 url。不漂亮、不可靠或不安全,但现在它可以工作。如果有人有其他方法,请稍作休息。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-12-12
      • 2018-02-25
      • 1970-01-01
      • 2022-07-05
      • 2018-09-18
      相关资源
      最近更新 更多