【问题标题】:cannot run pyspark in another pycharm project无法在另一个 pycharm 项目中运行 pyspark
【发布时间】:2021-03-11 17:52:52
【问题描述】:

我设法在 Mac v10.15.7 和我的一个 Pycharm 项目(我们称之为项目 A)上本地设置了 Spark。但是,我无法在我刚刚使用与项目 A 相同的解释器设置的另一个 Pycharm 项目(项目 B)中启动 Spark。

在项目 B 环境中,我似乎能够调用 spark 会话。当我去http://localhost:4040/ 时,一个 spark 会话已经建立。但是,当我开始执行命令时,我收到了类似的消息

Exception: Python in worker has different version 2.7 than that in driver 3.7, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

当我在 Project B pycharm 终端中调用 pyspark 时,我收到以下错误消息。虽然我确实设法通过从 Project A pycharm 终端和 Macbook Terminal 运行相同的命令来调用 spark。

macbook:projectB byc$ pyspark
Could not find valid SPARK_HOME while searching ['/Users/byc/PycharmProjects', '/Library/Frameworks/Python.framework/Versions/3.7/bin']

Did you install PySpark via a package manager such as pip or Conda? If so,
PySpark was not found in your Python environment. It is possible your
Python environment does not properly bind with your package manager.

Please check your default 'python' and if you set PYSPARK_PYTHON and/or
PYSPARK_DRIVER_PYTHON environment variables, and see if you can import
PySpark, for example, 'python -c 'import pyspark'.

If you cannot import, you can install by using the Python executable directly,
for example, 'python -m pip install pyspark [--user]'. Otherwise, you can also
explicitly set the Python executable, that has PySpark installed, to
PYSPARK_PYTHON or PYSPARK_DRIVER_PYTHON environment variables, for example,
'PYSPARK_PYTHON=python3 pyspark'.

/Library/Frameworks/Python.framework/Versions/3.7/bin/pyspark: line 24: /bin/load-spark-env.sh: No such file or directory
/Library/Frameworks/Python.framework/Versions/3.7/bin/pyspark: line 68: /bin/spark-submit: No such file or directory
/Library/Frameworks/Python.framework/Versions/3.7/bin/pyspark: line 68: exec: /bin/spark-submit: cannot execute: No such file or directory

在这里查看各种帖子,我添加了我的环境变量

PYTHONUNBUFFERED=1
PYSPARK_PYTHON=/Download/spark-3.0.1-bin-hadoop2.7
PYSPARK_DRIVER_PYTHON=/Download/spark-3.0.1-bin-hadoop2.7
SPARK_HOME=/usr/local/Cellar/apache-spark/3.0.1/libexec
PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH

再次关闭 Project B Pycharm,重新打开并再次运行命令。仍然没有运气。
我敢肯定我在这里错过了一些明显的部分,但就是不知道它们是什么!任何指针都非常感谢!

【问题讨论】:

    标签: python-3.x apache-spark pyspark pycharm environment-variables


    【解决方案1】:

    试试SPARK_HOME=/Download/spark-3.0.1-bin-hadoop2.7

    无需设置PYTHONPATH

    如果设置SPARK_HOME 不起作用,您可能需要在PYSPARK_PYTHON 中指定python 可执行文件的正确路径。您提供的路径看起来不正确。也许它会像PYSPARK_PYTHON=/usr/bin/python3

    【讨论】:

    • 感谢@mck 的提示。我重新设置环境变量如下PYTHONUNBUFFERED=1;PYSPARK_PYTHON=/usr/local/bin/python3.7;PYSPARK_DRIVER_PYTHON=/Download/spark-3.0.1-bin-hadoop2.7/python;SPARK_HOME=/Download/spark-3.0.1-bin-hadoop2.7 .
    • 除了这些变量之外,我还添加了两个内容根 --- 仍然没有运气......! /Users/byc/Downloads/spark-3.0.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip; /Users/byc/Downloads/spark-3.0.1-bin-hadoop2.7/python/lib/pyspark.zip
    【解决方案2】:

    尝试使用 pip 安装程序安装 pyspark。

    pip install pyspark==2.4.7
    

    如果您在本地使用它,我还建议您设置以下内容:

    export SPARK_LOCAL_IP="127.0.0.1"
    export PYSPARK_PYTHON=python3.6
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-03-31
      • 2020-02-14
      • 2019-10-19
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-04-23
      • 2011-03-13
      相关资源
      最近更新 更多