什么是 apache toree 的 spark 内核？答案

【问题标题】：What is a spark kernel for apache toree?什么是 apache toree 的 spark 内核？
【发布时间】：2017-01-18 15:42:20
【问题描述】：

我有一个火花集群，其主节点位于 192.168.0.60:7077

我曾经使用 jupyter notebook 制作一些 pyspark 脚本。我现在愿意继续使用 scala。我不知道斯卡拉的世界。我正在尝试使用 Apache Toree。我安装了它，下载了 scala 内核，然后运行它以打开一个 scala notebook 。直到那里一切似乎都很好：-/

但是找不到spark上下文，而且jupyter的服务器日志有错误：

[I 16:20:35.953 NotebookApp] Kernel started: afb8cb27-c0a2-425c-b8b1-3874329eb6a6
Starting Spark Kernel with SPARK_HOME=/Users/romain/spark
Error: Master must start with yarn, spark, mesos, or local
Run with --help for usage help or --verbose for debug output
[I 16:20:38.956 NotebookApp] KernelRestarter: restarting kernel (1/5)

由于我不了解 scala，我不确定这里的问题？可能是：

我需要一个 spark 内核（根据https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel）
我需要在服务器上添加一个选项（错误信息是'Master must start with yarn, spark, mesos, or local'）
或其他：-/

我只是愿意从 python 迁移到 scala，而我在启动 jupyter IDE 上浪费了几个小时：-/

【问题讨论】：

没用过jupyter，所以不知道#1，但是你的master应该配置成spark://192.168.0.60:7077。 Spark 有几种受支持的“集群管理器”或部署模式 - 表明您使用的是“独立”模式（即设置了 Spark 主服务器并正在侦听端口 7077），您使用带有spark 协议的 URL。
检查spark configuration for apache toree。你是用同样的方法安装的吗？
是的，我确实遵循了在线安装文档

标签： scala apache-spark jupyter apache-toree

【解决方案1】：

您似乎在独立部署模式下使用 Spark。正如 Tzach 在他的评论中建议的那样，以下应该有效：

SPARK_OPTS='--master=spark://192.168.0.60:7077' jupyter notebook

SPARK_OPTS 需要通常的 spark-submit 参数列表。

如果这没有帮助，您需要检查 conf/spark-env.sh 中的 SPARK_MASTER_PORT 值（默认值为 7077）。

【讨论】：