【发布时间】:2016-05-17 09:39:34
【问题描述】:
- 我看到了,Zoopekeeper instances in Kafka 有类似的问题。 但问题仍未得到解答。
所以我的问题的扩展版本(更多细节)
- 环境: 业务应用有3个节点。每个应用程序都包含自己的 1 个 zookeeper 和 1 个 kafka 嵌入式节点。
防止出现我必须澄清的困惑问题。我的业务应用程序建立在 elasticsearch 之上,有 3 个节点,minimumMasterNodes=2, 所以我在集群中的应用程序的容错性是 1。 所以我假设,以同样的方式,我可以将自己的 zookeeper 节点和 kafka 节点实例放入每个应用程序。 总体目标是在此堆栈之上使用容错 = 1 的 kafka mirrormaker 为业务应用程序构建数据中心间数据复制。
在我的实验中,我没有使用我的业务应用程序的完整堆栈,而在每个应用程序节点内只使用了 zookeeper+kafka。 每个应用程序都将其日志输出到控制台,因此我可以确定哪个应用程序以 LEADER 模式启动了 zookeeper。
我的 zookeeper 组合配置是:
server.1=localhost:2668:3668
server.2=localhost:2669:3669
server.3=localhost:2670:3670
syncLimit=5
initLimit=10
clientPort=* #here each node has its own value of port number: 2182,2183,2184 for servers 1,2,3 accordingly
dataDir=D:\rtest\3-nodes\data\*\zoo # * is 1, 2, 3 accordingly to servers 1,2,3
dataLogDir=D:\rtest\3-nodes\data\*\zoo\log # * is 1, 2, 3 accordingly to servers 1,2,3
- 我的故障场景是: 2.1。启动所有三个应用程序节点。启动消费者(控制台输出)。启动应用程序以生成消息序列。确保消费者通过 kafka 集群接收消息。 2.2.杀死 Zookeeper 实例为领导者的应用程序(在我的情况下是服务器#3)。 2.3.确保消费者不会从 kafka 主题输出任何新消息。
在我看来,问题出在动物园管理员身上。 以下是活动节点 1、2 生成的日志摘录: 看起来实时的 zookeeper 服务器正在继续尝试访问丢弃的服务器,而不是在它们之间就仲裁达成一致...... 顺便一提。在这种情况下,我什至无法通过控制台客户端连接到 Zookeeper(更清楚一点,我可以连接到它,但是在第一个命令中,我们应该说“ls /”控制台客户端异常崩溃)
服务器 1:
15459 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2182] WARN org.apache.zookeeper.server.quorum.Learner - Exception when following the leader
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at java.io.DataInputStream.readInt(Unknown Source)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
15460 [Thread-3-SendThread(127.0.0.1:2184)] WARN org.apache.zookeeper.ClientCnxn - Session 0x354b9dbe0b90001 for server 127.0.0.1/127.0.0.1:2184, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
15459 [Thread-3-SendThread(0:0:0:0:0:0:0:1:2184)] WARN org.apache.zookeeper.ClientCnxn - Session 0x354b9dbe0b90000 for server 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2184, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
15459 [RecvWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Connection broken for id 3, my id = 1, error = java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.DataInputStream.readInt(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
15462 [RecvWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Interrupting SendWorker
15462 [SendWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Interrupted while waiting for message on queue java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
15462 [SendWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Send worker leaving thread
15766 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182] WARN org.apache.zookeeper.server.NIOServerCnxn - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
16481 [WorkerSender[myid=1]] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 3 at election address localhost/127.0.0.1:3670
java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Unknown Source)
16596 [Thread-3-SendThread(127.0.0.1:2184)] WARN org.apache.zookeeper.ClientCnxn - Session 0x354b9dbe0b90000 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
...
服务器2:
...
5118 [RecvWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Connection broken for id 3, my id = 2, error =
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.DataInputStream.readInt(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
5121 [RecvWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Interrupting SendWorker
5120 [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2183] WARN org.apache.zookeeper.server.quorum.Learner - Exception when following the leader
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at java.io.DataInputStream.readInt(Unknown Source)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
5119 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183] WARN org.apache.zookeeper.server.NIOServerCnxn - Exception causing close of session 0x254b9dbe0b20000 due to java.io.IOException: An existing connect
ion was forcibly closed by the remote host
5122 [SendWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
5123 [SendWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Send worker leaving thread
5536 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183] WARN org.apache.zookeeper.server.NIOServerCnxn - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
6143 [WorkerSender[myid=2]] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 3 at election address localhost/127.0.0.1:3670
java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Unknown Source)
....
顺便说一句。由于我的要求,4 个这样的节点的组合运行完美。那么大家能不能回答一下,3个节点的zookeeper集群如果死掉一个节点还能存活吗?还是我做错了什么?
【问题讨论】:
标签: apache-kafka apache-zookeeper fault-tolerance