【问题标题】:Apache Spark can not connect to master using spark-submit script on Amazon EC2Apache Spark 无法使用 Amazon EC2 上的 spark-submit 脚本连接到 master
【发布时间】:2016-08-03 00:39:20
【问题描述】:

首先,我使用 spark-ec2 脚本在 EC2 上设置了一个具有一个主节点和一个工作节点的 Spark 集群。

在我使用 ssh 连接到我的 EC2 主实例后,我想运行 spark-submit 脚本,以便我可以运行自己的 Spark 代码。我首先上传我的 .jar 文件,然后使用脚本。

为此,我使用以下命令:

sudo /root/spark/bin/spark-submit --class "SimpleApp"\
--master spark://ec2-<adress>.us-west-1.compute.amazonaws.com:7077 simple-project-1.0.jar

遗憾的是,这不起作用,因为脚本无法连接到主服务器(最后的整个错误消息):

java.io.IOException: Failed to connect to ec2-<adress>.us-west-1.compute.amazonaws.com/<private-IP>:7077

我将入站规则添加到允许手动访问端口 7077 的安全组,但仍然收到相同的错误。 在设置和开始之间我可能需要做些什么吗?

[ec2-user@ip-172-31-11-100 ~]$ sudo /root/spark/bin/spark-submit --class "SimpleApp" --master spark://<ec2-address>.us-west-1.compute.amazonaws.com:7077 simple-project-1.0.jar 
16/08/02 12:18:43 INFO spark.SparkContext: Running Spark version 1.6.1
16/08/02 12:18:44 WARN spark.SparkConf: 
SPARK_WORKER_INSTANCES was detected (set to '1').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --num-executors to specify the number of executors
 - Or set SPARK_EXECUTOR_INSTANCES
 - spark.executor.instances to configure the number of instances in the spark config.

16/08/02 12:18:44 INFO spark.SecurityManager: Changing view acls to: root
16/08/02 12:18:44 INFO spark.SecurityManager: Changing modify acls to: root
16/08/02 12:18:44 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/08/02 12:18:45 INFO util.Utils: Successfully started service 'sparkDriver' on port 58516.
16/08/02 12:18:45 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/08/02 12:18:45 INFO Remoting: Starting remoting
16/08/02 12:18:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@172.31.11.100:45559]
16/08/02 12:18:46 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 45559.
16/08/02 12:18:46 INFO spark.SparkEnv: Registering MapOutputTracker
16/08/02 12:18:46 INFO spark.SparkEnv: Registering BlockManagerMaster
16/08/02 12:18:46 INFO storage.DiskBlockManager: Created local directory at /mnt/spark/blockmgr-83f1cf8d-3783-4659-a0da-64ae7c95e850
16/08/02 12:18:46 INFO storage.DiskBlockManager: Created local directory at /mnt2/spark/blockmgr-9a22a761-a18f-45a4-9d49-dcfaf7f9e4f8
16/08/02 12:18:46 INFO storage.MemoryStore: MemoryStore started with capacity 511.5 MB
16/08/02 12:18:46 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/08/02 12:18:46 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/08/02 12:18:46 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/08/02 12:18:46 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/08/02 12:18:46 INFO ui.SparkUI: Started SparkUI at http://ec2-54-153-24-33.us-west-1.compute.amazonaws.com:4040
16/08/02 12:18:46 INFO spark.HttpFileServer: HTTP File server directory is /mnt/spark/spark-12fdcf09-fcfc-4bf6-98d3-ec1f27d21345/httpd-da6f3d59-bc33-4a06-bac9-cb0c27fd82d9
16/08/02 12:18:46 INFO spark.HttpServer: Starting HTTP Server
16/08/02 12:18:46 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/08/02 12:18:47 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:59371
16/08/02 12:18:47 INFO util.Utils: Successfully started service 'HTTP file server' on port 59371.
16/08/02 12:18:47 INFO spark.SparkContext: Added JAR file:/home/ec2-user/simple-project-1.0.jar at http://172.31.11.100:59371/jars/simple-project-1.0.jar with timestamp 1470140327032
16/08/02 12:18:47 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077...
16/08/02 12:18:47 WARN client.AppClient$ClientEndpoint: Failed to connect to master ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077
java.io.IOException: Failed to connect to ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt: ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more
16/08/02 12:19:07 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077...
16/08/02 12:19:07 WARN client.AppClient$ClientEndpoint: Failed to connect to master ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077
java.io.IOException: Failed to connect to ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt: ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more
16/08/02 12:19:27 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077...
16/08/02 12:19:27 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077...
16/08/02 12:19:27 WARN client.AppClient$ClientEndpoint: Failed to connect to master ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077
java.io.IOException: Failed to connect to ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt: ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more
16/08/02 12:19:47 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077...
16/08/02 12:19:47 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
16/08/02 12:19:47 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077...
16/08/02 12:19:47 WARN cluster.SparkDeploySchedulerBackend: Application ID is not initialized yet.
16/08/02 12:19:47 WARN client.AppClient$ClientEndpoint: Failed to connect to master ec2-54-183-242-177.us-west-1.compute.amazonaws.com:7077
java.io.IOException: Failed to connect to ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt: ec2-54-183-242-177.us-west-1.compute.amazonaws.com/172.31.11.100:7077
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more
16/08/02 12:19:47 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52691.
16/08/02 12:19:47 INFO netty.NettyBlockTransferService: Server created on 52691
16/08/02 12:19:47 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/08/02 12:19:47 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.31.11.100:52691 with 511.5 MB RAM, BlockManagerId(driver, 172.31.11.100, 52691)
16/08/02 12:19:47 INFO storage.BlockManagerMaster: Registered BlockManager
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
16/08/02 12:19:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
16/08/02 12:19:47 INFO ui.SparkUI: Stopped Spark web UI at http://ec2-54-153-24-33.us-west-1.compute.amazonaws.com:4040
16/08/02 12:19:47 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
16/08/02 12:19:47 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down
16/08/02 12:19:47 WARN client.AppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
16/08/02 12:19:47 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1038)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
    at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
    at scala.concurrent.Await$.result(package.scala:107)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.deploy.client.AppClient.stop(AppClient.scala:290)
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.org$apache$spark$scheduler$cluster$SparkDeploySchedulerBackend$$stop(SparkDeploySchedulerBackend.scala:198)
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.stop(SparkDeploySchedulerBackend.scala:101)
    at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:446)
    at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1582)
    at org.apache.spark.SparkContext$$anonfun$stop$9.apply$mcV$sp(SparkContext.scala:1740)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1229)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1739)
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:127)
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:134)
    at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1163)
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:129)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
16/08/02 12:19:47 INFO storage.DiskBlockManager: Shutdown hook called
16/08/02 12:19:47 INFO util.ShutdownHookManager: Shutdown hook called
16/08/02 12:19:47 INFO util.ShutdownHookManager: Deleting directory /mnt/spark/spark-12fdcf09-fcfc-4bf6-98d3-ec1f27d21345/userFiles-7ddf41a5-7328-4bdd-afcd-a4610404ecac
16/08/02 12:19:47 INFO util.ShutdownHookManager: Deleting directory /mnt2/spark/spark-5991f32e-20ef-4433-8de7-44ad57c53d97
16/08/02 12:19:47 INFO util.ShutdownHookManager: Deleting directory /mnt/spark/spark-12fdcf09-fcfc-4bf6-98d3-ec1f27d21345
16/08/02 12:19:47 INFO util.ShutdownHookManager: Deleting directory /mnt/spark/spark-12fdcf09-fcfc-4bf6-98d3-ec1f27d21345/httpd-da6f3d59-bc33-4a06-bac9-cb0c27fd82d9

【问题讨论】:

  • 您是否尝试过使用 --master local[x] 提交?使用 SSH 连接到实例后,它应该可以工作,或者至少表明问题出在 Spark 而不是网络设置
  • 我试过了,没有问题。但是我是否也可以访问所有从节点?这意味着当我启动 10 个实例并使用 --master local[10] 时,他会使用所有从属吗?
  • 如果您使用的是 local[],则不使用任何主/从配置,甚至在单个节点上也不使用。 local 适合调试,但不测试集群方面。我发现您的错误中的 0.0.0.0 可疑,这仍然是一个问题吗?如果是这样,要么接受答案,要么写下你自己的答案,这样我们就知道它是如何工作的......话虽如此,你有没有检查过主从之间对无密码 ssh 的基本需求?即使在单节点设置上也是如此,即使您在一台机器上运行 master/driver 和 slave/worker,您也必须能够在没有密码的情况下 ssh。
  • 另外,您没有提到(或者我错过了)您正在运行哪种 Linux 风格。我发现在基于 debian 的版本中,您必须在配置文件中使用 IP 地址而不是主机名。 Check this answer and the one linked from there 了解更多信息,特别是如果你在 Ubuntu 中。

标签: apache-spark amazon-ec2


【解决方案1】:

如果您没有使用 YARN 或 mesos 作为 cluster managers,即独立模式,则必须使用 spark-submit 将您的应用一个一个地部署到每个集群。

假设您在创建 Standalone 集群模式时已对 master 和 slave 进行了正确的配置,在每个集群上使用 SSH 在本地部署应用程序 (local[n]) 会很好。

回答您第二个问题,local 指令仅让您选择设置应用程序应在每个集群上运行多少个threadsn 是线程数。因此,它与是否要在一个或多个集群上运行无关。

因此,如果您使用 spark-submit 通过 SSH 将应用程序部署到本地的所有集群(主服务器和从服务器),并且拥有正确的 Standalone 设置,那么您的应用程序应该可以在所有集群上运行。

【讨论】:

  • 谢谢你的回答,我试试这个。不过,我对您的回答有一些疑问: 1. 主从配置的正确配置是什么?我刚刚启动了 spark-ec2 脚本,然后尝试了 spark-submit 脚本。 2. 一个一个地部署在每个集群上是什么意思?你的意思是在每个主从节点上一一对应吗?
  • 1) 请点击答案中的链接了解更多详细信息,我不熟悉 spark-ec2 脚本,但亚马逊可能将指令分组在一个脚本中。如果您阅读链接的内容以更好地了解 Spark Standalone 结构,您会很高兴。 2) 是的,Spark 拓扑设计有一个 master 集群和 n 个 slave,在 Standalone 模式下,您需要将您的应用一个一个部署到每个集群。
  • 一一部署对我来说是新信息。我如何确保它们在相同的数据上协同工作?在 100 节点前 5 分钟开始,他们如何沟通?
  • 此处声明:spark.apache.org/docs/latest/… 有一个脚本可以启动从属集群,您可以在其中告诉 spark 从属应连接到哪个主控。如果您使用的 EC2 脚本正在这样做,那么您的从属服务器已经连接,并且在主服务器中启动应用程序将在从服务器之间分发作业 - 但是,应用程序应该部署在从服务器中以便主服务器能够分发作业给他们。
  • 感谢您的帮助!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-04-10
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多