【发布时间】:2021-12-27 20:51:36
【问题描述】:
我正在尝试运行 hello-world PySpark 应用程序。
我正在使用 PyCharm
我的LOL.py 脚本代码:
import os
os.environ["SPARK_HOME"] = "/opt/spark"
from pyspark.sql import SparkSession
def init_spark():
spark = SparkSession.builder.appName("HelloWorld").getOrCreate()
sc = spark.sparkContext
return spark,sc
def main():
spark,sc = init_spark()
nums = sc.parallelize([1,2,3,4])
print(nums.map(lambda x: x*x).collect())
if __name__ == '__main__':
main()
输出:
/opt/spark/bin/spark-class: line 71: /usr/libexec/java_home/bin/java: Not a directory
Traceback (most recent call last):
File "/Users/evgenii/PycharmProjects/bi-etl-orders/LOL.py", line 19, in <module>
main()
File "/Users/evgenii/PycharmProjects/bi-etl-orders/LOL.py", line 13, in main
spark,sc = init_spark()
File "/Users/evgenii/PycharmProjects/bi-etl-orders/LOL.py", line 8, in init_spark
spark = SparkSession.builder.appName("HelloWorld").getOrCreate()
File "/Users/evgenii/.pyenv/versions/3.7.5/lib/python3.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/Users/evgenii/.pyenv/versions/3.7.5/lib/python3.7/site-packages/pyspark/context.py", line 349, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/Users/evgenii/.pyenv/versions/3.7.5/lib/python3.7/site-packages/pyspark/context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/Users/evgenii/.pyenv/versions/3.7.5/lib/python3.7/site-packages/pyspark/context.py", line 298, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/Users/evgenii/.pyenv/versions/3.7.5/lib/python3.7/site-packages/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
我知道“Java 网关进程在发送其端口号之前退出”通常是由于 JAVA_HOME 设置不正确而引发的。
但我认为情况并非如此,因为我的 JAVA_HOME 看起来很正常:
$ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home
这是我的 SPARK_HOME:
$ echo $SPARK_HOME
/opt/spark
我的环境:
- Python 3.7.5(如果重要,我会使用 pyenv)
- Java 采用openjdk-8.jdk(通过 Homebrew 安装。如果重要的话,我也将采用 Openjdk-11.jdk 存储在同一文件夹中)
- PySpark 2.4.0
- MacOS BigSur 11.5.2
- PyCharm Pro 2021.1.3
我阅读了几本相关指南,但目前都没有帮助。 我将不胜感激任何帮助。 提前谢谢!
【问题讨论】:
标签: python java apache-spark pyspark