【发布时间】:2020-12-11 11:39:11
【问题描述】:
我正在尝试在我的机器上本地运行来自 Archives Unleashed 的 Jupyter 笔记本。当笔记本构建 PySpark 时,会遇到以下异常:
Exception: Unable to find py4j, your SPARK_HOME may not be configured correctly
知道如何配置 SPARK_HOME 吗?
我尝试在干净的 conda 环境中运行笔记本。这是出现错误之前的完整笔记本:
%%capture
!wget "https://github.com/archivesunleashed/aut/releases/download/aut-0.50.0/aut-0.50.0.zip"
!wget "https://github.com/archivesunleashed/aut/releases/download/aut-0.50.0/aut-0.50.0-fatjar.jar"
!ls
%%capture
!apt-get update
!apt-get install -y openjdk-8-jdk-headless -qq
!apt-get install maven -qq
!curl -L "https://archive.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz" > spark-2.4.5-bin-hadoop2.7.tgz
!tar -xvf spark-2.4.5-bin-hadoop2.7.tgz
!pip install -q findspark
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.4.5-bin-hadoop2.7"
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars aut-0.50.0-fatjar.jar --py-files aut-0.50.0.zip pyspark-shell'
import findspark
findspark.init()
import pyspark
sc = pyspark.SparkContext()
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
这是我得到的回报:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~/opt/miniconda3/envs/arc/lib/python3.8/site-packages/findspark.py in init(spark_home, python_path, edit_rc, edit_profile)
142 try:
--> 143 py4j = glob(os.path.join(spark_python, "lib", "py4j-*.zip"))[0]
144 except IndexError:
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
<ipython-input-2-03823ebc9ad8> in <module>
1 import findspark
----> 2 findspark.init()
3 import pyspark
4 sc = pyspark.SparkContext()
5 from pyspark.sql import SQLContext
~/opt/miniconda3/envs/arc/lib/python3.8/site-packages/findspark.py in init(spark_home, python_path, edit_rc, edit_profile)
143 py4j = glob(os.path.join(spark_python, "lib", "py4j-*.zip"))[0]
144 except IndexError:
--> 145 raise Exception(
146 "Unable to find py4j, your SPARK_HOME may not be configured correctly"
147 )
Exception: Unable to find py4j, your SPARK_HOME may not be configured correctly
【问题讨论】:
标签: python pyspark jupyter-notebook