为什么在 Mesos 上 Spark 作业会因“hadoop：未找到”而失败？答案

【问题标题】：Why does Spark job fail on Mesos with "hadoop: not found"?为什么在 Mesos 上 Spark 作业会因“hadoop：未找到”而失败？
【发布时间】：2016-04-28 21:19:14
【问题描述】：

我在 Debian 8 上使用 Spark 1.6.1、Hadoop 2.6.4 和 Mesos 0.28。

在尝试通过 spark-submit 向 Mesos 集群提交作业时，从站失败，并在 stderr 日志中显示以下内容：

I0427 22:35:39.626055 48258 fetcher.cpp:424] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/ad642fcf-9951-42ad-8f86-cc4f5a5cb408-S0\/hduser","items":[{"action":"BYP$
I0427 22:35:39.628031 48258 fetcher.cpp:379] Fetching URI 'hdfs://xxxxxxxxx:54310/sources/spark/SimpleEventCounter.jar'
I0427 22:35:39.628057 48258 fetcher.cpp:250] Fetching directly into the sandbox directory
I0427 22:35:39.628078 48258 fetcher.cpp:187] Fetching URI 'hdfs://xxxxxxx:54310/sources/spark/SimpleEventCounter.jar'
E0427 22:35:39.629243 48258 shell.hpp:93] Command 'hadoop version 2>&1' failed; this is the output:
sh: 1: hadoop: not found
Failed to fetch 'hdfs://xxxxxxx:54310/sources/spark/SimpleEventCounter.jar': Failed to create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was e$
Failed to synchronize with slave (it's probably exited)

我的 Jar 文件包含 hadoop 2.6 二进制文件
spark executor/binary 的路径是通过hdfs:// 链接

我的工作没有出现在框架选项卡中，但它们确实出现在状态为“排队”的驱动程序中，它们只是坐在那里，直到我关闭 spark-mesos-dispatcher.sh 服务。

【问题讨论】：

你在 mesos slaves 上配置了hadoop_home 吗？似乎无法在 mesos slave 上找到 hadoop home！
在 Mesos 的 JIRA 上有一个类似的 issue。检查您的机器上是否安装了curl
你怎么spark-submit，即你能显示整个命令行吗？
暂时我搬到 Yarn 来让工作运行起来，我会在本周晚些时候回来讨论这个问题。对延误表示歉意。仅供参考，所有机器上都安装了 curl，并且还配置了 hadoop_home。 ./spark-submit --class EventCounter --master mesos://xxxxx:7077 --deploy-mode client --supervise --executor-memory 500m hdfs://xxxxx:54310/sources/spark/SimpleEventCounter.jar

标签： apache-spark mesos mesosphere

【解决方案1】：

我看到了一个非常相似的错误，我发现我的问题是没有在 mesos 代理中设置 hadoop_home。我在每个 mesos-slave 上的 /etc/default/mesos-slave （安装路径可能不同）中添加了以下行：MESOS_hadoop_home="/path/to/my/hadoop/install/folder/"

编辑：Hadoop 必须安装在每个从属设备上，路径/to/my/haoop/install/folder 是本地路径

【讨论】：