【问题标题】:Storm topology deployment timeout风暴拓扑部署超时
【发布时间】:2017-11-27 07:27:27
【问题描述】:

我正在尝试在我的 Macbook Pro 上设置 Apache Storm (1.0.2),但如果我尝试部署拓扑显然会遇到超时问题。 UI 也会挂起并吐出相同的异常。

3491 [main] INFO  o.a.s.StormSubmitter - Generated ZooKeeper secret payload for MD5-digest: -8915636774701640550:-6510752657961785886
3580 [main] INFO  o.a.s.s.a.AuthUtils - Got AutoCreds []
Exception in thread "main" java.lang.RuntimeException: org.apache.storm.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
    at org.apache.storm.security.auth.TBackoffConnect.retryNext(TBackoffConnect.java:64)
    at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:56)
    at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:99)
    at org.apache.storm.security.auth.ThriftClient.<init>(ThriftClient.java:69)
    at org.apache.storm.utils.NimbusClient.<init>(NimbusClient.java:106)
    at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:78)
    at org.apache.storm.StormSubmitter.topologyNameExists(StormSubmitter.java:371)
    at org.apache.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:233)
    at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:311)
    at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:157)
Caused by: org.apache.storm.thrift.transport.TTransportException: java.net.ConnectException: Operation timed out (Connection timed out)
    at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:226)
    at org.apache.storm.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
    at org.apache.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:103)
    at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:53)
    ... 9 more
Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:221)
    ... 12 more

我正在使用来自 github 存储库的默认 storm.yaml 配置; Zookeeper 也没有任何更改和默认 zoo.cfg 文件。

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=5
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=2
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
clientPortAddress=localhost
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

我遇到了类似的问题,促使我检查我的主机文件;我已经发布如下

##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
255.255.255.255 broadcasthost
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         ip6-localhost ip6-localhost.localdomain localhost6 localhost6.localdomain6

当我启动 zookeeper 服务器时;我相信它会像往常一样开始。

2017-11-27 16:05:14,314 [myid:] - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: /Users/aniket.alhat/Tools/zookeeper/bin/../conf/zoo.cfg
2017-11-27 16:05:14,318 [myid:] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2017-11-27 16:05:14,318 [myid:] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2017-11-27 16:05:14,318 [myid:] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2017-11-27 16:05:14,318 [myid:] - WARN  [main:QuorumPeerMain@113] - Either no config or no quorum defined in config, running  in standalone mode
2017-11-27 16:05:14,329 [myid:] - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: /Users/aniket.alhat/Tools/zookeeper/bin/../conf/zoo.cfg
2017-11-27 16:05:14,330 [myid:] - INFO  [main:ZooKeeperServerMain@95] - Starting server
2017-11-27 16:05:14,335 [myid:] - INFO  [main:Environment@100] - Server environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2017-11-27 16:05:14,336 [myid:] - INFO  [main:Environment@100] - Server environment:host.name=10.9.157.77
2017-11-27 16:05:14,336 [myid:] - INFO  [main:Environment@100] - Server environment:java.version=1.8.0_131
2017-11-27 16:05:14,336 [myid:] - INFO  [main:Environment@100] - Server environment:java.vendor=Oracle Corporation
2017-11-27 16:05:14,336 [myid:] - INFO  [main:Environment@100] - Server environment:java.home=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre
2017-11-27 16:05:14,336 [myid:] - INFO  [main:Environment@100] - Server environment:java.class.path=/Users/aniket.alhat/Tools/zookeeper/bin/../build/classes:/Users/aniket.alhat/Tools/zookeeper/bin/../build/lib/*.jar:/Users/aniket.alhat/Tools/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/Users/aniket.alhat/Tools/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/Users/aniket.alhat/Tools/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/Users/aniket.alhat/Tools/zookeeper/bin/../lib/log4j-1.2.16.jar:/Users/aniket.alhat/Tools/zookeeper/bin/../lib/jline-0.9.94.jar:/Users/aniket.alhat/Tools/zookeeper/bin/../zookeeper-3.4.6.jar:/Users/aniket.alhat/Tools/zookeeper/bin/../src/java/lib/*.jar:/Users/aniket.alhat/Tools/zookeeper/bin/../conf:
2017-11-27 16:05:14,336 [myid:] - INFO  [main:Environment@100] - Server environment:java.library.path=/Users/aniket.alhat/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
2017-11-27 16:05:14,336 [myid:] - INFO  [main:Environment@100] - Server environment:java.io.tmpdir=/var/folders/9c/g5cj60_j1x344r3zpd_hr99j5jwnk4/T/
2017-11-27 16:05:14,336 [myid:] - INFO  [main:Environment@100] - Server environment:java.compiler=<NA>
2017-11-27 16:05:14,337 [myid:] - INFO  [main:Environment@100] - Server environment:os.name=Mac OS X
2017-11-27 16:05:14,337 [myid:] - INFO  [main:Environment@100] - Server environment:os.arch=x86_64
2017-11-27 16:05:14,338 [myid:] - INFO  [main:Environment@100] - Server environment:os.version=10.12.6
2017-11-27 16:05:14,338 [myid:] - INFO  [main:Environment@100] - Server environment:user.name=aniket.alhat
2017-11-27 16:05:14,338 [myid:] - INFO  [main:Environment@100] - Server environment:user.home=/Users/aniket.alhat
2017-11-27 16:05:14,338 [myid:] - INFO  [main:Environment@100] - Server environment:user.dir=/Users/aniket.alhat/Tools/zookeeper-3.4.6
2017-11-27 16:05:14,344 [myid:] - INFO  [main:ZooKeeperServer@755] - tickTime set to 2000
2017-11-27 16:05:14,344 [myid:] - INFO  [main:ZooKeeperServer@764] - minSessionTimeout set to -1
2017-11-27 16:05:14,344 [myid:] - INFO  [main:ZooKeeperServer@773] - maxSessionTimeout set to -1
2017-11-27 16:05:14,361 [myid:] - INFO  [main:NIOServerCnxnFactory@94] - binding to port localhost/127.0.0.1:2181

而且我在 nimbus 日志中也没有看到任何错误

2017-11-27 16:05:35.365 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:host.name=10.49.48.134
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:java.version=1.8.0_131
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:java.vendor=Oracle Corporation
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:java.home=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:java.class.path=/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/asm-5.0.3.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/clojure-1.7.0.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/disruptor-3.3.2.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/kryo-3.0.3.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/log4j-api-2.1.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/log4j-core-2.1.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/log4j-over-slf4j-1.6.6.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/log4j-slf4j-impl-2.1.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/minlog-1.3.0.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/objenesis-2.1.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/reflectasm-1.10.1.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/servlet-api-2.5.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/slf4j-api-1.7.7.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/storm-core-1.0.2.jar:/Users/aniket.alhat/Tools/apache-storm-1.0.2/lib/storm-rename-hack-1.0.2.jar:/Users/aniket.alhat/Tools/storm/conf
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:java.io.tmpdir=/var/folders/9c/g5cj60_j1x344r3zpd_hr99j5jwnk4/T/
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:java.compiler=<NA>
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:os.name=Mac OS X
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:os.arch=x86_64
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:os.version=10.12.6
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:user.name=aniket.alhat
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:user.home=/Users/aniket.alhat
2017-11-27 16:05:35.373 o.a.s.s.o.a.z.ZooKeeper [INFO] Client environment:user.dir=/Users/aniket.alhat/Tools/apache-storm-1.0.2
2017-11-27 16:05:35.374 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=localhost:2181/storm sessionTimeout=20000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@eac3a26
2017-11-27 16:05:35.397 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-11-27 16:05:35.400 o.a.s.b.FileBlobStoreImpl [INFO] Creating new blob store based in storm-local/blobs
2017-11-27 16:05:35.406 o.a.s.d.nimbus [INFO] Using default scheduler
2017-11-27 16:05:35.408 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2017-11-27 16:05:35.409 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=localhost:2181 sessionTimeout=20000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@68868328
2017-11-27 16:05:35.411 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-11-27 16:05:35.438 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2017-11-27 16:05:35.438 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=localhost:2181 sessionTimeout=20000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@512d6e60
2017-11-27 16:05:35.440 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-11-27 16:05:35.478 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection established to localhost/127.0.0.1:2181, initiating session
2017-11-27 16:05:35.478 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection established to localhost/127.0.0.1:2181, initiating session
2017-11-27 16:05:35.479 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection established to localhost/127.0.0.1:2181, initiating session
2017-11-27 16:05:35.513 o.a.s.s.o.a.z.ClientCnxn [INFO] Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15ffc4b4d950000, negotiated timeout = 20000
2017-11-27 16:05:35.513 o.a.s.s.o.a.z.ClientCnxn [INFO] Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15ffc4b4d950002, negotiated timeout = 20000
2017-11-27 16:05:35.513 o.a.s.s.o.a.z.ClientCnxn [INFO] Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15ffc4b4d950001, negotiated timeout = 20000
2017-11-27 16:05:35.517 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2017-11-27 16:05:35.517 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2017-11-27 16:05:35.517 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2017-11-27 16:05:35.518 o.a.s.zookeeper [INFO] Zookeeper state update: :connected:none
2017-11-27 16:05:35.518 o.a.s.zookeeper [INFO] Zookeeper state update: :connected:none
2017-11-27 16:05:35.531 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] backgroundOperationsLoop exiting
2017-11-27 16:05:35.534 o.a.s.s.o.a.z.ZooKeeper [INFO] Session: 0x15ffc4b4d950002 closed
2017-11-27 16:05:35.534 o.a.s.s.o.a.z.ClientCnxn [INFO] EventThread shut down
2017-11-27 16:05:35.536 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2017-11-27 16:05:35.536 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=localhost:2181/storm sessionTimeout=20000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@3722c145

如果我能得到一些帮助来解决超时问题,我将不胜感激。

【问题讨论】:

    标签: apache-zookeeper apache-storm session-timeout


    【解决方案1】:

    检查 Nimbus 是否正确启动。当 Nimbus 的实例未正确终止时,我遇到了类似的问题。

    尝试终止进程并重新启动 Nimbus。

    【讨论】:

    • 对不起,你说的正确是什么意思?日志显示“正在启动 nimbus ..”,我可以看到端口 6627 的状态为 LISTEN
    【解决方案2】:

    经过多次反复试验,我发现我的 Nimbus 进程以 IP 地址 10.9.157.77 启动,而 ifconfig 给我 10.49.52.97 不知道为什么/这是怎么回事,如果有人能帮我弄清楚,我将不胜感激。

    nimbus.log

    2017-11-30 16:47:00.342 o.a.s.zookeeper [INFO] 10.9.157.77 gained leadership, checking if it has all the topology code locally.
    2017-11-30 16:47:00.350 o.a.s.zookeeper [INFO] active-topology-ids [] local-topology-ids [] diff-topology []
    2017-11-30 16:47:00.350 o.a.s.zookeeper [INFO] Accepting leadership, all active topology found localy.
    2017-11-30 16:47:00.352 o.a.s.d.m.MetricsUtils [INFO] Using statistics reporter plugin:org.apache.storm.daemon.metrics.reporters.JmxPreparableReporter
    

    ifconfig

    en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        ether xx:xx:xx:xx:xx:xx
        inet6 fe80::427:8998:bb4d:b2bd%en0 prefixlen 64 secured scopeid 0x4
        inet 10.49.52.97 netmask 0xfffffc00 broadcast 10.49.55.255
    

    继续往前走,我发现每次启动 nimbus 进程的 IP 地址 10.9.157.77 都被神奇地获取并存储在 zookeeper 中。

    [zk: localhost:2181(CONNECTED) 15] ls /storm/nimbuses
    [10.9.157.77:6627]
    

    我用 rmr 清理了 /storm 目录并重新启动 nimbus 创建目录,但没有任何变化。

    我也试过刷新DNS缓存,使用的命令是sudo killall -HUP mDNSResponder

    我也观察到重启后IP魔法IP不一样了,变成了10.49.48.134

    2017-11-30 17:28:59.630 o.a.s.zookeeper [INFO] 10.49.48.134 gained leadership, checking if it has all the topology code locally.
    2017-11-30 17:28:59.646 o.a.s.zookeeper [INFO] active-topology-ids [] local-topology-ids [] diff-topology []
    2017-11-30 17:28:59.646 o.a.s.zookeeper [INFO] Accepting leadership, all active topology found localy. 
    

    后来我断开了 Wifi 的连接并再次启动了一切,我能够在本地启动 Storm UI 运行命令storm list deploy topology。

    【讨论】:

      【解决方案3】:

      你可以在你的storm/conf/storm.yaml中添加storm.local.hostname=,然后重启。也适用于 IPv4/FQDN 和 not IPv6。这对我有用(同样的 Storm 1.0.2)

      如果还有问题,也可以在 Nimbus 的主机上添加nimbus.seeds=

      【讨论】:

        猜你喜欢
        • 2013-08-06
        • 1970-01-01
        • 2016-07-22
        • 2016-08-21
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多