【问题标题】:PySpark not initializedPySpark 未初始化
【发布时间】:2021-12-14 04:58:06
【问题描述】:

我可以在本地 PC 上运行“spark-shell”。 但是我不能让 pyspark 在 PC 上运行,错误附在日志中。

我也在很多地方搜索过,但这并没有解决我的问题。 任何有 PySpark 经验的人都可以启发我的路径。提前谢谢你。

我的配置:

  • 火花:3.2.0
  • Java 17
  • Python 3.8.6
Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/10/29 10:37:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/10/29 10:37:08 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext should be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
py4j.ClientServerConnection.run(ClientServerConnection.java:106)
java.base/java.lang.Thread.run(Thread.java:833)
C:\DS\spark\python\pyspark\shell.py:42: UserWarning: Failed to initialize Spark session.
  warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
  File "C:\DS\spark\python\pyspark\shell.py", line 38, in <module>
    spark = SparkSession._create_shell_session()  # type: ignore
  File "C:\DS\spark\python\pyspark\sql\session.py", line 553, in _create_shell_session
    return SparkSession.builder.getOrCreate()
  File "C:\DS\spark\python\pyspark\sql\session.py", line 228, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "C:\DS\spark\python\pyspark\context.py", line 392, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "C:\DS\spark\python\pyspark\context.py", line 146, in __init__
    self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
  File "C:\DS\spark\python\pyspark\context.py", line 209, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "C:\DS\spark\python\pyspark\context.py", line 329, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "C:\DS\spark\python\lib\py4j-0.10.9.2-src.zip\py4j\java_gateway.py", line 1573, in __call__
    return_value = get_return_value(
  File "C:\DS\spark\python\lib\py4j-0.10.9.2-src.zip\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
        at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
        at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
        at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
        at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
        at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:833)


C:\DS\spark\bin>SUCCESS: The process with PID 29408 (child process of PID 19416) has been terminated.
SUCCESS: The process with PID 19416 (child process of PID 37944) has been terminated.
SUCCESS: The process with PID 37944 (child process of PID 23752) has been terminated.

【问题讨论】:

    标签: python java apache-spark pyspark


    【解决方案1】:

    我认为您可能使用了错误的 Java 版本。来自doc

    Spark 可在 Java 8/11、Scala 2.12、Python 3.6+ 和 R 3.5+ 上运行。从 Spark 3.2.0 开始,不推荐使用 Python 3.6 支持。自 Spark 3.2.0 起,Java 8 支持 8u201 版本之前的版本已被弃用。对于 Scala API,Spark 3.2.0 使用 Scala 2.12。您需要使用兼容的 Scala 版本 (2.12.x)。

    尝试安装 Java 11 而不是您当前的版本。

    【讨论】:

    • 非常感谢。我安装了 Java 11 并且 pyspark 正在工作。但 spark-shell 不工作,出现错误: 原因:java.net.URISyntaxException: Illegal character in path at index 39: spark://[domain-address].com:28000/C:\classes
    • 你正在执行的命令是什么?
    • 您的问题解决了吗?我也面临同样的问题
    【解决方案2】:

    是的,这是 java 版本的问题。我刚刚安装了 OpenJDK 8 并卸载了其他版本的 java 并且 pyspark 现在可以正常工作了

    我正在使用

    • 火花 - 3.2.0
    • 操作系统 - macOS 蒙特雷
    • Java - 16.0.1
    • Python - 3.9

    检查安装 java 版本

    /usr/libexec/java_home --verbose
    

    卸载其他java版本

    brew uninstall AdoptOpenJDK
    

    运行pyspark,您将进入屏幕下方

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-11-11
      • 2013-06-26
      • 2016-04-02
      • 2020-02-14
      • 2018-09-10
      • 1970-01-01
      相关资源
      最近更新 更多