【问题标题】:Network issue on Apache Spark deploymentApache Spark 部署的网络问题
【发布时间】:2016-11-12 16:11:17
【问题描述】:

到目前为止,我一直在我的应用程序中使用“嵌入式”Spark。现在,我想在专用服务器上运行它。

我就那么远:

  • 全新ubuntu 16,服务器名micha / ip 10.0.100.120,安装scala 2.10,安装Spark 1.6.2,重新编译
  • Pi 测试工作
  • 8080 端口上的 UI 工作正常

日志说:

Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /opt/apache-spark-1.6.2/conf/:/opt/apache-spark-1.6.2/assembly/target/scala-2.10/spark-assembly-1.6.2-hadoop2.2.0.jar:/opt/apache-spark-1.6.2/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/opt/apache-spark-1.6.2/lib_managed/jars/datanucleus-core-3.2.10.jar:/opt/apache-spark-1.6.2/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip micha --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/07/10 13:03:55 INFO Master: Registered signal handlers for [TERM, HUP, INT]
16/07/10 13:03:55 WARN Utils: Your hostname, micha resolves to a loopback address: 127.0.1.1; using 10.0.100.120 instead (on interface eno1)
16/07/10 13:03:55 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/07/10 13:03:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/10 13:03:55 INFO SecurityManager: Changing view acls to: root
16/07/10 13:03:55 INFO SecurityManager: Changing modify acls to: root
16/07/10 13:03:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/07/10 13:03:56 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
16/07/10 13:03:56 INFO Master: Starting Spark master at spark://micha:7077
16/07/10 13:03:56 INFO Master: Running Spark version 1.6.2
16/07/10 13:03:56 INFO Server: jetty-8.y.z-SNAPSHOT
16/07/10 13:03:56 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:8080
16/07/10 13:03:56 INFO Utils: Successfully started service 'MasterUI' on port 8080.
16/07/10 13:03:56 INFO MasterWebUI: Started MasterWebUI at http://10.0.100.120:8080
16/07/10 13:03:56 INFO Server: jetty-8.y.z-SNAPSHOT
16/07/10 13:03:56 INFO AbstractConnector: Started SelectChannelConnector@micha:6066
16/07/10 13:03:56 INFO Utils: Successfully started service on port 6066.
16/07/10 13:03:56 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066
16/07/10 13:03:56 INFO Master: I have been elected leader! New state: ALIVE

在我的应用程序中,我将配置更改为:

SparkConf conf = new SparkConf().setAppName("myapp").setMaster("spark://10.0.100.120:6066");

(也试过7077

在客户端:

16-07-10 13:22:58:300 INFO org.spark-project.jetty.server.AbstractConnector - Started SelectChannelConnector@0.0.0.0:4040
16-07-10 13:22:58:300 DEBUG org.spark-project.jetty.util.component.AbstractLifeCycle - STARTED SelectChannelConnector@0.0.0.0:4040
16-07-10 13:22:58:300 DEBUG org.spark-project.jetty.util.component.AbstractLifeCycle - STARTED org.spark-project.jetty.server.Server@3eb292cd
16-07-10 13:22:58:301 INFO org.apache.spark.util.Utils - Successfully started service 'SparkUI' on port 4040.
16-07-10 13:22:58:306 INFO org.apache.spark.ui.SparkUI - Started SparkUI at http://10.0.100.100:4040
16-07-10 13:22:58:621 INFO org.apache.spark.deploy.client.AppClient$ClientEndpoint - Connecting to master spark://10.0.100.120:6066...
16-07-10 13:22:58:648 DEBUG org.apache.spark.network.client.TransportClientFactory - Creating new connection to /10.0.100.120:6066
16-07-10 13:22:58:689 DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetectionLevel: simple
16-07-10 13:22:58:714 WARN org.apache.spark.deploy.client.AppClient$ClientEndpoint - Failed to connect to master 10.0.100.120:6066
java.io.IOException: Failed to connect to /10.0.100.120:6066
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)

如果我尝试远程登录:

$ telnet 10.0.100.120 6066
Trying 10.0.100.120...
telnet: connect to address 10.0.100.120: Connection refused
telnet: Unable to connect to remote host

$ telnet 10.0.100.120 7077
Trying 10.0.100.120...
telnet: connect to address 10.0.100.120: Connection refused
telnet: Unable to connect to remote host

在服务器上,我检查了 netstat:

jgp@micha:/opt/apache-spark$ netstat -a | grep 6066
tcp6       0      0 micha.nc.rr.com:6066    [::]:*                  LISTEN     
jgp@micha:/opt/apache-spark$ netstat -a | grep 7077
tcp6       0      0 micha.nc.rr.com:7077    [::]:*                  LISTEN 

如果我解释这个,它看起来像是在 IP v6 而不是 v4...

更新 #1:

我设置:

_JAVA_OPTIONS=-Djava.net.preferIPv4Stack=true
SPARK_LOCAL_IP=10.0.100.120

我仍然在日志中有警告:

16/07/10 14:10:13 WARN Utils: Your hostname, micha resolves to a loopback address: 127.0.1.1; using 10.0.100.120 instead (on interface eno1)
16/07/10 14:10:13 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

仍然拒绝连接...

更新 #2:

在系统的 /etc/hosts 中有一条奇怪的行:

127.0.0.1      localhost
127.0.1.1      micha.nc.rr.com micha

我已将其注释掉,现在我在 Spark 的日志文件中有以下内容:

Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /opt/apache-spark-1.6.2/conf/:/opt/apache-spark-1.6.2/assembly/target/scala-2.10/spark-assembly-1.6.2-hadoop2.2.0.jar:/opt/apache-spark-1.6.2/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/opt/apache-spark-1.6.2/lib_managed/jars/datanucleus-core-3.2.10.jar:/opt/apache-spark-1.6.2/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip micha --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/07/10 22:11:54 INFO Master: Registered signal handlers for [TERM, HUP, INT]
16/07/10 22:11:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/10 22:11:54 INFO SecurityManager: Changing view acls to: root
16/07/10 22:11:54 INFO SecurityManager: Changing modify acls to: root
16/07/10 22:11:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7078. Attempting port 7079.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7079. Attempting port 7080.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7080. Attempting port 7081.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7081. Attempting port 7082.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7082. Attempting port 7083.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7083. Attempting port 7084.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7084. Attempting port 7085.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7085. Attempting port 7086.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7086. Attempting port 7087.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7087. Attempting port 7088.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7088. Attempting port 7089.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7089. Attempting port 7090.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7090. Attempting port 7091.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7091. Attempting port 7092.
16/07/10 22:11:55 WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7093.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'sparkMaster' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:433)
        at sun.nio.ch.Net.bind(Net.java:425)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
        at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
        at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
        at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
        at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
        at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:745)

【问题讨论】:

  • 至少这看起来很奇怪micha resolves to a loopback address: 127.0.1.1,因为环回地址通常指向127.0.0.1,但我猜你的系统上的这个环回配置有问题。这只是您应该考虑的事情。虽然不知道如何解决您的问题。赞成它。
  • 感谢 Jorge - 我设置了 2 个环境变量(请参阅更新),但它仍然不在乎。

标签: java apache-spark ubuntu-server


【解决方案1】:

您必须在 Spark 服务器中配置 spark-env.sh 文件。将SPARK_MASTER_IP 添加到spark-env.sh

export SPARK_MASTER_IP=10.0.100.120

如果要从远程应用程序连接到 master,请使用 7077 端口。 6066 用于 REST API。

SparkConf conf = new SparkConf().setAppName("myapp").setMaster("spark://10.0.100.120:7077");

【讨论】:

    【解决方案2】:

    看来我的 /etc/hosts 将主机名指向 127.0.1.1 而不是真实 IP,因此由于某些原因,服务监听的是 127.0.1.1,而不是 10.0.100.120

    【讨论】:

      猜你喜欢
      • 2013-08-29
      • 1970-01-01
      • 2020-07-21
      • 1970-01-01
      • 2021-11-07
      • 2019-03-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多