【发布时间】:2019-06-08 07:19:13
【问题描述】:
我运行 Windows 10 并通过 Anaconda3 安装了 Python3。我正在使用 Jupyter 笔记本。我已经安装了 Spark from here (spark-2.3.0-bin-hadoop2.7.tgz)。我已提取文件并将它们粘贴到我的目录 D:\Spark 中。我已经修改了环境变量:
用户变量:
变量:SPARK_HOME
值:D:\Spark
系统变量:
变量:PATH
值:D:\Spark\bin
我已经通过 conda 安装/更新了以下模块:
熊猫
numpy
pyarrow
pyspark
py4j
Java 已安装:
我不知道这是否相关,但在我的环境变量中出现以下两个变量:
完成所有这些后,我重新启动并运行以下代码,导致出现错误消息,我将其粘贴到此处:
import pandas as pd
import seaborn as sns
# These lines enable the run of spark commands
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)
import pyspark
data = sns.load_dataset('iris')
data_sp = spark.createDataFrame(data)
data_sp.show()
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-1-ec964ecd39a2> in <module>()
7 from pyspark.context import SparkContext
8 from pyspark.sql.session import SparkSession
----> 9 sc = SparkContext('local')
10 spark = SparkSession(sc)
11
C:\ProgramData\Anaconda3\lib\site-packages\pyspark\context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
113 """
114 self._callsite = first_spark_call() or CallSite(None, None, None)
--> 115 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
116 try:
117 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
C:\ProgramData\Anaconda3\lib\site-packages\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
296 with SparkContext._lock:
297 if not SparkContext._gateway:
--> 298 SparkContext._gateway = gateway or launch_gateway(conf)
299 SparkContext._jvm = SparkContext._gateway.jvm
300
C:\ProgramData\Anaconda3\lib\site-packages\pyspark\java_gateway.py in launch_gateway(conf)
92
93 if not os.path.isfile(conn_info_file):
---> 94 raise Exception("Java gateway process exited before sending its port number")
95
96 with open(conn_info_file, "rb") as info:
Exception: Java gateway process exited before sending its port number
如何让 PySpark 工作?
【问题讨论】:
-
你也应该安装java。另外,
D:\spark\sbin\start-master.sh的输出是什么? -
@BlackBear:感谢您的评论。 Java 已安装 - 请参阅我更新的帖子。至于你的问题,我不明白——对不起。你到底想让我做什么?您提到的文件存在于我的目录中,但我应该如何处理它?
-
@user8270077 你解决问题了吗?
标签: python apache-spark pyspark