解决安装 Pyspark (Windows) 的问题
JAVA_HOME 路径不正确
> pyspark
The system cannot find the path specified.
打开系统环境变量:
rundll32 sysdm.cpl,EditEnvironmentVariables
设置 JAVA_HOME:系统变量 > 新建:
Variable Name: JAVA_HOME
Variable Value: C:\Program Files\Java\jdk1.8.0_261
另外,检查 SPARK_HOME 和 HADOOP_HOME 是否设置正确,例如:
SPARK_HOME=C:\Spark\spark-3.2.0-bin-hadoop3.2
HADOOP_HOME=C:\Spark\spark-3.2.0-bin-hadoop3.2
重要提示:仔细检查以下内容
- 路径存在
- 路径不包含
bin 文件夹
Java 版本不正确
> pyspark
WARN SparkContext: Another SparkContext is being constructed
UserWarning: Failed to initialize Spark session.
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
确保 JAVA_HOME 设置为 Java 8 (jdk1.8.0)
winutils 未安装
> pyspark
WARN Shell: Did not find winutils.exe
java.io.FileNotFoundException: Could not locate Hadoop executable
下载winutils.exe并将其复制到您的spark home bin文件夹中
curl -OutFile C:\Spark\spark-3.2.0-bin-hadoop3.2\bin\winutils.exe -Uri https://github.com/steveloughran/winutils/raw/master/hadoop-3.0.0/bin/winutils.exe