【发布时间】:2017-01-26 09:09:12
【问题描述】:
我有一个带有 1 个 Nimbus、4 个 Supervisor 和 2 个 Zookeeper 节点的 Storm 集群。我的Storm.yaml如下:
storm.zookeeper.servers:
- "storage14"
- "storage15"
nimbus.seeds: ["storage01"]
#storm.local.hostname: "storage05"
supervisor.supervisors:
- "storage02"
- "storage03"
- "storage04"
- "storage05"
storm.local.dir: "/tmp/storm"
worker.childopts: "-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump"
这个storm.yaml 文件被Nimbus 和Supervisors 使用。当 Nimbus 启动时,我将 storm.local.hostname 注释掉,如上所示。
但是,在各个节点上启动 Supervisor 时,我取消注释 storm.local.hostname 并将其设置为正在启动 Supervisor 的节点的主机名。例如,如果我在 storage05 上启动主管,storm.yaml 文件将具有以下附加配置参数:
storm.local.hostname: "storage05"
问题是即使 Nimubs 启动成功,我可以在 Storm UI 上看到它,但一些 supervisors 似乎无法连接到 雨云。例如,在我启动 supervisors 的 4 个节点中,Storm UI 通常只显示其中 2 个已连接。但是,如果我 ssh 进入这些节点并运行 jps,我可以看到 supervisor 进程正在所有这些节点上运行。
最终连接的节点上的Supervisor并不总是相同的,因此这些特定节点绝对不是问题。
要注意的另一件事是,如果我尝试在连接的任何节点上执行拓扑,它不会被集群注册,我也无法在 UI 上看到该拓扑。
您认为导致这种不稳定行为的原因是什么?
更新: nimbus.log 尾部有以下几行
2017-01-25 00:04:25.216 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage14/192.168.140.194:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
【问题讨论】:
-
1.您使用哪个版本? 2.所有节点都可以访问“storage05”吗? 3.“storage05”是否被UI识别为“Leader”? 4. Nimbus 日志是否有任何错误信息?
-
1.我在 Storm 1.0.1 2. 是的,我可以在所有节点之间使用无密码 ssh,包括 storage05。 3. 不,但我认为不应该是这样,因为我不是从 Storage05 启动 Nimbus。 Storage01 是 Nimbus, storage02-05 是主管。但是,是的,storage01 确实被 UI 认可为领导者。
-
对不起,我错过了。如果“storage01”被识别为Leader就好了。
标签: apache-storm apache-zookeeper