【问题标题】:Spark workers keeps exitingSpark 工作人员不断退出
【发布时间】:2021-02-07 00:46:13
【问题描述】:

我已经设置了一个安装了 pyspark 和 jupyter notebook 的 spark 主服务器。工人在 spark-master 网页中注册。 appname 也会显示在 webui 中。但是,当我运行数据集/数据帧之类的东西时,工作人员会继续退出。

我已经设置了 conf/spark-env.sh 我在 /etc/hosts 中声明了主/从

Worker Exiting Image spark-submit --master log

工人确实注册了,但这是工人的错误日志

Spark Executor Command: "/usr/lib/jvm/jre/bin/java" "-cp" "/root/PySpark/conf/:/root/PySpark/jars/*" "-Xmx1024M" "-Dspark.driver.port=42592" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@sparkmaster:42592" "--executor-id" "62" "--hostname" "192.168.1.101" "--cores" "2" "--app-id" "app-20201023194940-0001" "--worker-url" "spark://Worker@192.168.1.101:45404"
========================================

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/10/23 19:52:46 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 4003@slave1
20/10/23 19:52:46 INFO SignalUtils: Registered signal handler for TERM
20/10/23 19:52:46 INFO SignalUtils: Registered signal handler for HUP
20/10/23 19:52:46 INFO SignalUtils: Registered signal handler for INT
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/root/PySpark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/10/23 19:52:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/10/23 19:52:47 INFO SecurityManager: Changing view acls to: root
20/10/23 19:52:47 INFO SecurityManager: Changing modify acls to: root
20/10/23 19:52:47 INFO SecurityManager: Changing view acls groups to: 
20/10/23 19:52:47 INFO SecurityManager: Changing modify acls groups to: 
20/10/23 19:52:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:283)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:272)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$3(CoarseGrainedExecutorBackend.scala:303)
    at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
    at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
    at scala.collection.immutable.Range.foreach(Range.scala:158)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$1(CoarseGrainedExecutorBackend.scala:301)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    ... 4 more
Caused by: java.io.IOException: Failed to connect to sparkmaster/192.168.1.100:42592
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: sparkmaster/192.168.1.100:42592
Caused by: java.net.NoRouteToHostException: No route to host
    at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:834)

【问题讨论】:

  • 如果你在服务器集群上运行,你确定防火墙吗?似乎 Spark 工作人员可以解析 sparkmaster,但他们无法连接到它。

标签: apache-spark pyspark jupyter-notebook


【解决方案1】:

请检查是否所有节点都知道主机名和端口。

【讨论】:

    猜你喜欢
    • 2018-10-12
    • 1970-01-01
    • 2014-03-28
    • 1970-01-01
    • 1970-01-01
    • 2020-07-04
    • 1970-01-01
    • 1970-01-01
    • 2019-11-07
    相关资源
    最近更新 更多