Apache Spark 的 Hive 部署问题（集群模式）答案

【问题标题】：Apache Spark's deployment issue (cluster-mode) with HiveApache Spark 的 Hive 部署问题（集群模式）
【发布时间】：2016-04-30 06:14:26
【问题描述】：

编辑：

我正在开发一个从多个结构化模式中读取数据的 Spark 应用程序，并且我正在尝试从这些模式中聚合信息。我的应用程序在本地运行时运行良好。但是当我在集群上运行它时，我遇到了配置问题（很可能是 hive-site.xml）或提交命令参数。我查找了其他相关帖子，但找不到特定于我的场景的解决方案。我已经在下面详细提到了我尝试过的命令以及遇到的错误。我是 Spark 的新手，可能会遗漏一些琐碎的事情，但可以提供更多信息来支持我的问题。

原问题：

我一直在尝试在捆绑了 HDP2.3 组件的 6 节点 Hadoop 集群中运行我的 spark 应用程序。

以下是可能对你们提出解决方案有用的组件信息：

集群信息：6节点集群：

128GB 内存 24芯 8TB硬盘

应用中使用的组件

HDP - 2.3

火花 - 1.3.1

$ hadoop 版本：

Hadoop 2.7.1.2.3.0.0-2557
Subversion git@github.com:hortonworks/hadoop.git -r 9f17d40a0f2046d217b2bff90ad6e2fc7e41f5e1
Compiled by jenkins on 2015-07-14T13:08Z
Compiled with protoc 2.5.0
From source with checksum 54f9bbb4492f92975e84e390599b881d

场景：

我正在尝试以某种方式使用 SparkContext 和 HiveContext，以充分利用 spark 对其数据结构（如数据框）的实时查询。我的应用程序中使用的依赖项是：

<dependency> <!-- Spark dependency -->
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-csv_2.10</artifactId>
        <version>1.4.0</version>
    </dependency>

以下是我得到的提交命令和相应的错误日志：

提交命令1：

spark-submit --class working.path.to.Main \
    --master yarn \
    --deploy-mode cluster \
    --num-executors 17 \
    --executor-cores 8 \
    --executor-memory 25g \
    --driver-memory 25g \
    --num-executors 5 \
    application-with-all-dependencies.jar

错误日志1：

User class threw exception: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

提交命令2：

spark-submit --class working.path.to.Main \
    --master yarn \
    --deploy-mode cluster \
    --num-executors 17 \
    --executor-cores 8 \
    --executor-memory 25g \
    --driver-memory 25g \
    --num-executors 5 \
    --files /etc/hive/conf/hive-site.xml \
    application-with-all-dependencies.jar

错误日志2：

User class threw exception: java.lang.NumberFormatException: For input string: "5s"

由于我没有管理权限，我无法修改配置。好吧，我可以联系 IT 工程师并进行更改，但我正在寻找尽可能减少配置文件更改的解决方案！

建议更改配置here。

然后我尝试按照其他论坛中的建议将各种 jar 文件作为参数传递。

提交命令3：

spark-submit --class working.path.to.Main \
    --master yarn \
    --deploy-mode cluster \
    --num-executors 17 \
    --executor-cores 8 \
    --executor-memory 25g \
    --driver-memory 25g \
    --num-executors 5 \
    --jars /usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-core-3.2.10.jar,/usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-rdbms-3.2.9.jar \
    --files /etc/hive/conf/hive-site.xml \
    application-with-all-dependencies.jar

错误日志3：

User class threw exception: java.lang.NumberFormatException: For input string: "5s"

我不明白下面的命令发生了什么，无法分析错误日志。

提交命令4：

spark-submit --class working.path.to.Main \
    --master yarn \
    --deploy-mode cluster \
    --num-executors 17 \
    --executor-cores 8 \
    --executor-memory 25g \
    --driver-memory 25g \
    --num-executors 5 \
    --jars /usr/hdp/2.3.0.0-2557/spark/lib/*.jar \
    --files /etc/hive/conf/hive-site.xml \
    application-with-all-dependencies.jar

提交日志4：

Application application_1461686223085_0014 failed 2 times due to AM Container for appattempt_1461686223085_0014_000002 exited with exitCode: 10
For more detailed output, check application tracking page:http://cluster-host:XXXX/cluster/app/application_1461686223085_0014Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e10_1461686223085_0014_02_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.

还有其他可能的选择吗？任何形式的帮助将不胜感激。如果您需要任何其他信息，请告诉我。

谢谢。

【问题讨论】：

鸡尾酒问题是什么？请每期发布一个问题！这是不可接受的
亲爱的@eliasah，我知道我的问题有点长。但是，如果您查看问题的组织，我会尝试询问我的提交命令有什么问题，该命令引发了与配置单元配置相关的错误。我认为在提问时提供更多信息对读者有好处，这将有助于他们理解场景。如果你不喜欢它，我很抱歉，但我的本意不是把它当作鸡尾酒问题。好吧，我的问题仍然是一样的，因为我已经尝试过但没有找到答案。
你本可以建议我在否决投票之前提出问题的适当方式，因为我不害怕改善我的失败，目的是为我不知道的问题寻求解决方案.
否决票不是确定的。如果您通过编辑您的问题来正确回应否决投票，使其符合网站的范围，我很乐意将其删除。如果您想知道如何在网站上提出一个好的问题，请阅读此stackoverflow.com/help/how-to-ask
我已经编辑了我的问题，并试图以更清晰的方式解释我的问题。我指的是文档，并且仍然对正确编辑我的问题以使其尽可能具体的建议持开放态度。谢谢。

标签： hadoop apache-spark apache-spark-sql apache-hive

【解决方案1】：

here 中解释的解决方案适用于我的情况。 hive-site.xml 驻留的两个位置可能会令人困惑。使用--files /usr/hdp/current/spark-client/conf/hive-site.xml 而不是--files /etc/hive/conf/hive-site.xml。我不必为我的配置添加罐子。希望这会帮助那些在类似问题上苦苦挣扎的人。谢谢。

【讨论】：