【问题标题】:PySpark - SparkContext: Error initializing SparkContext File does not existPySpark - SparkContext:初始化 SparkContext 文件不存在时出错
【发布时间】:2018-06-30 13:33:09
【问题描述】:

我在 PySpark 中有一小段代码,但我不断收到错误。我是新手,所以我不知道从哪里开始。

from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName("Open json").setMaster("local[3]")

sc = SparkContext(conf = conf)
print("Done")

我在 cmd 中使用以下命令运行它:

spark-submit .\PySpark\Open.py

然后我得到以下错误声明:

C:\Users\Abdullah\Documents\Master Thesis>spark-submit .\PySpark\Open.py

 18/06/30 15:21:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-Java classes where applicable 
18/06/30 15:22:01 ERROR SparkContext: Error initializing SparkContext. java.io.FileNotFoundException: File file:/C:/Users/Abdullah/Documents/Master%20Thesis/PySpark/Open.py does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1529)
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1499)
        at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
        at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:461)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source) Traceback (most recent call last):   File "C:/Users/Abdullah/Documents/Master Thesis/./PySpark/Open.py", line 12, i n <module>
    sc = SparkContext(conf = conf)   File "C:\apache-spark\spark-2.2.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark
\context.py", line 118, in __init__   File
"C:\apache-spark\spark-2.2.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark
\context.py", line 180, in _do_init   File
"C:\apache-spark\spark-2.2.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark
\context.py", line 282, in _initialize_context   File
"C:\apache-spark\spark-2.2.0-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip
\py4j\java_gateway.py", line 1525, in __call__   File
"C:\apache-spark\spark-2.2.0-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip
\py4j\protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
None.org.apache.spa rk.api.java.JavaSparkContext. :
java.io.FileNotFoundException: File
file:/C:/Users/Abdullah/Documents/Master%2 0Thesis/PySpark/Open.py
does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLo
calFileSystem.java:611)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1529)
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1499)
        at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
        at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:461)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)

【问题讨论】:

  • PySpark/Open.py does not exist...
  • 你能显示只运行 pyspark 命令的输出吗?
  • 黄色框中的最后一部分是pyspark命令的输出。
  • 正在尝试重新格式化您的输出...无论如何,它只能说找不到您的文件。请在您的问题中显示完整路径
  • 天啊!!就是这样!它完全修复了它。太感谢了 !!!!!我为此工作了两天。一。愚蠢的。空间。我不敢相信。

标签: apache-spark hadoop pyspark


【解决方案1】:

根据您的日志,您正在尝试在窗口机器上运行 Apache Spark。

你需要添加win util并在env变量中添加路径

  • 从 Hortonworks 存储库、亚马逊 AWS 平台或 github winutils 下载可执行文件 winutils

  • 创建一个放置可执行文件 winutils.exe 的目录。例如,C:\SparkDev\x64。添加指向该目录的环境变量 %HADOOP_HOME%,然后将 %HADOOP_HOME%\bin 添加到 PATH 中。

【讨论】:

  • 感谢您的回答,但我已经按照您描述的方式将环境变量说了出来。我还将 winutils 文件放在正确的目录中,但我一直收到同样的错误。我是否应该删除所有内容并从头开始安装 spark 和 hadoop?
  • 检查您的路径,进入 cmd 并输入“路径”,winutil 是否显示在路径上? , 这是 windows 用户的常见错误
  • 我有目录 C:\hadoop\bin 并且在 bin 文件夹中我有 winutils.exe 文件。这是正确的吗?
  • 是的,路径正确检查环境变量hadoop home和路径,还检查winutil没有损坏
  • 没关系。我的目录名称中有一个空格,这完全搞砸了。无论如何,谢谢!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-01-25
  • 2019-05-19
  • 2019-02-23
  • 1970-01-01
  • 2019-04-09
  • 2021-02-05
相关资源
最近更新 更多