【问题标题】:How to install and configure Apache Toree for Jupyter notebook in windows 10?如何在 Windows 10 中为 Jupyter notebook 安装和配置 Apache Toree?
【发布时间】:2017-12-19 18:08:49
【问题描述】:

有人可以帮助我在 Windows 10 中为 jupyter notebook 安装和配置 apache toree 吗?我试过但没有成功。 遇到的错误如下。

启动内核失败

未知的服务器错误。

     Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\notebook\base\handlers.py", line 516, in wrapper
    result = yield gen.maybe_future(method(self, *args, **kwargs))
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1055, in run
    value = future.result()
  File "C:\Anaconda3\lib\site-packages\tornado\concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Anaconda3\lib\site-packages\notebook\services\sessions\handlers.py", line 75, in post
    type=mtype))
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1055, in run
    value = future.result()
  File "C:\Anaconda3\lib\site-packages\tornado\concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Anaconda3\lib\site-packages\notebook\services\sessions\sessionmanager.py", line 79, in create_session
    kernel_id = yield self.start_kernel_for_session(session_id, path, name, type, kernel_name)
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1055, in run
    value = future.result()
  File "C:\Anaconda3\lib\site-packages\tornado\concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Anaconda3\lib\site-packages\notebook\services\sessions\sessionmanager.py", line 92, in start_kernel_for_session
    self.kernel_manager.start_kernel(path=kernel_path, kernel_name=kernel_name)
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1055, in run
    value = future.result()
  File "C:\Anaconda3\lib\site-packages\tornado\concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 307, in wrapper
    yielded = next(result)
  File "C:\Anaconda3\lib\site-packages\notebook\services\kernels\kernelmanager.py", line 94, in start_kernel
    super(MappingKernelManager, self).start_kernel(**kwargs)
  File "C:\Anaconda3\lib\site-packages\jupyter_client\multikernelmanager.py", line 110, in start_kernel
    km.start_kernel(**kwargs)
  File "C:\Anaconda3\lib\site-packages\jupyter_client\manager.py", line 243, in start_kernel
    **kw)
  File "C:\Anaconda3\lib\site-packages\jupyter_client\manager.py", line 189, in _launch_kernel
    return launch_kernel(kernel_cmd, **kw)
  File "C:\Anaconda3\lib\site-packages\jupyter_client\launcher.py", line 123, in launch_kernel
    proc = Popen(cmd, **kwargs)
  File "C:\Anaconda3\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\Anaconda3\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
OSError: [WinError 193] %1 is not a valid Win32 application

【问题讨论】:

  • 如果您描述了您尝试过的方法以及它为什么不起作用,或者您收到了哪些错误消息,将会有所帮助。
  • 尝试在 jupyter 中安装 apache toree 作为内核以运行 spark 应用程序。
  • 我想更详细地扩展我的问题。最初在 Windows 10 中,我安装了 anaconda 最新版本,然后启动了 jupyter。默认情况下 Python3 内核是可用的,但是因为我应该使用 Spark,所以我为 pyspark、SQL、Scala 安装并配置了 Apache Toree。所有新添加的内核都显示在 jupyter kernelspec 列表中,但是当我在 jupyter 中打开新添加的内核时,它显示内核错误,指出“无法启动内核”查看已经可用的 stackoverflow 答案我还尝试更改内核中的路径。 json 文件,但它也没有工作。请帮忙。
  • 你是如何安装 spark 的?我认为问题在于您的 spark 安装和权限。

标签: apache-spark windows-10 jupyter-notebook apache-toree


【解决方案1】:

Apache Toree 使用%PROG_HOME%\bin\run.sh 启动内核。

在 Windows 上,PROG_HOME 通常是 C:\Users\{Account_Name}\AppData\Roaming\jupyter\kernels\apache_toree_scala

由于 Windows 无法运行 shell 脚本,因此会引发操作系统错误:

[WinError 193] %1 不是有效的 Win32 应用程序。

您需要按照以下步骤操作:

  1. 下载与 Scala 2.11 兼容的 Spark 版本并设置 SPARK_HOME 环境变量。请注意,Apache Toree 内核版本 0.3.0-incubating 使用 Scala 版本 2.11。

  2. %PROG_HOME%/bin 创建一个 Windows 批处理文件 (run.bat) 或 Windows 命令脚本文件 (run.cmd)。与 run.sh 类似,使用以下命令通过SparkSubmit.scala 类启动内核。

%JAVA_HOME%\bin\java -cp "%SPARK_HOME%\jars\*;%PROG_HOME%\lib\toree-assembly-0.3.0-incubating.jar;." -Dscala.usejavacp=true org.apache.spark.deploy.SparkSubmit %SPARK_OPTS% --class org.apache.toree.Main %PROG_HOME%\lib\toree-assembly-0.3.0-incubating.jar %TOREE_OPTS% %*
  1. PROG_HOME文件夹中kernel.json文件中的argv参数值从run.sh更新为run.cmd em>

  2. 启动 Anaconda 提示。运行jupyter notebook 命令。从浏览器中选择“Apache Toree - Scala”内核。您可以在 Anaconda 提示符下查看内核连接状态。

【讨论】:

    【解决方案2】:

    想添加到上面@UmeshD 的答案 如果您使用的是 toree-assembly-0.3.0-incubating,请创建一个文件 run.cmd(在 C:\Users\{Account_Name}\AppData\Roaming\jupyter\kernels\apache_toree_scala/bin/ 中)并在下面粘贴以下代码

    @echo off
    pushd "%~dp0\..\"
    set PROG_HOME=%cd%
    popd
    
    if not defined SPARK_HOME ( 
        echo "SPARK_HOME must be set to the location of a Spark distribution!"
        Exit /b
    )
    
    echo "Starting Spark Kernel with SPARK_HOME=%SPARK_HOME%"
    
    
    rem for /f %%i in ('dir /B toree-assembly-*.jar') do set KERNEL_ASSEMBLY=%%i popd
    
    rem disable randomized hash for string in Python 3.3+
    set PYTHONHASHSEED=0
    rem set TOREE_ASSEMBLY=%PROG_HOME%/lib/%KERNEL_ASSEMBLY%
    rem The SPARK_OPTS values during installation are stored in __TOREE_SPARK_OPTS__. This allows values to be specified during
    rem install, but also during runtime. The runtime options take precedence over the install options.
    if not defined SPARK_OPTS (
        if defined __TOREE_SPARK_OPTS__ (
            set SPARK_OPTS=%__TOREE_SPARK_OPTS__%
        )
    )
    
    if not defined TOREE_OPTS (
        if defined __TOREE_SPARK_OPTS__ (
            set TOREE_OPTS=%__TOREE_OPTS__%
        )
    )
    
    %JAVA_HOME%\bin\java -cp "%SPARK_HOME%\jars\*;%PROG_HOME%\lib\toree-assembly-0.3.0-incubating.jar;." -Dscala.usejavacp=true org.apache.spark.deploy.SparkSubmit %SPARK_OPTS% --class org.apache.toree.Main %PROG_HOME%\lib\toree-assembly-0.3.0-incubating.jar %TOREE_OPTS% %*
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-04-08
      • 1970-01-01
      • 1970-01-01
      • 2016-09-09
      • 2018-08-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多