【问题标题】:HBase distributed modeHBase 分布式模式
【发布时间】:2013-09-03 08:30:05
【问题描述】:

我正在尝试在 3 节点 Hadoop(1.0.4) 集群上以分布式模式运行 HBase(0.94.11),但我希望仅将两个节点用于 HBase。

Master/Namenode : cldx-1230-1116( IP : 172.25.38.245)
Regionserver/Slave : cldx-1229-1117(IP : 172.25.39.7)

HBase 正在启动,但没有反映区域服务器。在日志中,显示以下错误:

主节点/namenode 日志

2013-09-03 14:52:23,683 DEBUG org.apache.hadoop.hbase.master.HMaster: Started service threads
2013-09-03 14:52:23,684 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2013-09-03 14:52:24,587 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=172.25.39.7:2222 sessionTimeout=180000 watcher=hconnection
2013-09-03 14:52:24,607 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 31222@cldx-1230-1116
2013-09-03 14:52:24,610 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server slave/172.25.39.7:2222. Will not attempt to authenticate using SASL (unknown error)
2013-09-03 14:52:24,615 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to slave/172.25.39.7:2222, initiating session
2013-09-03 14:52:24,631 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server slave/172.25.39.7:2222, sessionid = 0x140e363f8090002, negotiated timeout = 180000
2013-09-03 14:52:25,230 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 1546 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2013-09-03 14:52:26,753 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 3068 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2013-09-03 14:52:28,266 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 4582 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.

区域服务器/从属日志

2013-09-03 16:05:18,307 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=172.25.39.7:2222 sessionTimeout=180000 watcher=regionserver:60020
2013-09-03 16:05:18,333 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/172.25.39.7:2222. Will not attempt to authenticate using SASL (unknown error)
2013-09-03 16:05:18,336 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 14384@cldx-1229-1117
2013-09-03 16:05:18,348 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to localhost/172.25.39.7:2222, initiating session
2013-09-03 16:05:18,426 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server localhost/172.25.39.7:2222, sessionid = 0x140e363f8090000, negotiated timeout = 180000
2013-09-03 16:05:18,452 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@3a9cfedf
2013-09-03 16:05:18,517 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/online-snapshot/acquired already exists and this is not a retry
2013-09-03 16:05:18,557 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: globalMemStoreLimit=393.4m, globalMemStoreLimitLowMark=344.2m, maxHeap=983.4m
2013-09-03 16:05:18,561 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Runs every 2hrs, 46mins, 40sec
2013-09-03 16:05:18,621 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at localhost,60000,1378199761324
2013-09-03 16:05:28,697 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:390)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:436)
    at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1127)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
    at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
    at com.sun.proxy.$Proxy8.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
    at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2030)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2076)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:744)
    at java.lang.Thread.run(Thread.java:722)

slave 的 zookeeper 日志

2013-09-03 16:05:18,345 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /172.25.39.7:48173
2013-09-03 16:05:18,392 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /172.25.39.7:48173
2013-09-03 16:05:18,395 INFO org.apache.zookeeper.server.persistence.FileTxnLog: Creating new log file: log.5a
2013-09-03 16:05:18,422 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140e363f8090000 with negotiated timeout 180000 for client /172.25.39.7:48173
2013-09-03 16:05:18,508 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x140e363f8090000 type:create cxid:0x8 zxid:0x5b txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired
2013-09-03 16:05:33,933 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /172.25.38.245:50879
2013-09-03 16:05:33,972 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /172.25.38.245:50879
2013-09-03 16:05:33,975 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140e363f8090001 with negotiated timeout 180000 for client /172.25.38.245:50879
2013-09-03 16:05:42,358 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x140e363f8090001 type:create cxid:0xb zxid:0x5d txntype:-1 reqpath:n/a Error Path:/hbase/master Error:KeeperErrorCode = NodeExists for /hbase/master
2013-09-03 16:05:47,934 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x140e363f8090001 type:create cxid:0x1f zxid:0x63 txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired
2013-09-03 16:05:49,037 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /172.25.38.245:50889
2013-09-03 16:05:49,042 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /172.25.38.245:50889
2013-09-03 16:05:49,050 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140e363f8090002 with negotiated timeout 180000 for client /172.25.38.245:50889
2013-09-03 16:08:15,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x140e35e60460000, timeout of 180000ms exceeded
2013-09-03 16:08:15,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x140d02920860000, timeout of 180000ms exceeded
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x140e35e60460001, timeout of 180000ms exceeded
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x140e35e60460000
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x140d02920860000
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x140e35e60460001

regionservers 文件只有一个条目,即。 172.25.39.7

hbase-site.xml

<configuration>
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://172.25.38.245:9000/hbase</value>
  <description>The directory shared by RegionServers.</description>
</property>

<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
  <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
  </description>
</property>

<property>
  <name>hbase.zookeeper.property.clientPort</name>
  <value>2222</value>
</property>

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>172.25.39.7</value>
</property>

<property>
  <name>hbase.zookeeper.property.dataDir</name>
  <value>/home/bigdata/hadoop_ecosystem_dir/zookeeper</value>
</property>

</configuration>
  1. namenode(172.25.38.245) 上的 Hadoop ma​​sters 文件有 172.25.38.245
  2. namenode(172.25.38.245) 172.25.38.245,172.25.39.7 和 172.25.36.73 上的 Hadoop slaves 文件
  3. 从 (172.25.39.7) 上的 Hadoop ma​​sters 文件有 172.25.38.245
  4. 从 (172.25.39.7) 上的 Hadoop slaves 文件有 172.25.39.7

主机上的主机文件:

#127.0.0.1      localhost
#172.25.38.245   localhost
172.25.38.245   cldx-1230-1116
172.17.88.75    cloudx
172.25.38.245 master
172.25.39.7   slave
# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

主机上的文件:

#127.0.0.1      localhost
#172.25.39.7     localhost
172.25.39.7     cldx-1229-1117 cldx-1229-1117
172.25.38.245     cldx-1230-1116 cldx-1230-1116
172.17.88.75    cloudx
172.25.38.245 master
172.25.39.7   slave
# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

我不知道为什么 regionserver/slave 试图连接到本地主机上的主服务器而不是 172.25.38.245 !

【问题讨论】:

    标签: hadoop hbase


    【解决方案1】:

    将 HMaster 的 IP 和主机名添加到 RS 的 /etc/hosts 文件中并重新启动 HBase 守护进程。一个可能的原因可能是您的 HMaster 假设 RS 的 IP 为 127.0.0.1(这意味着 localhost),因此解析为自己的 localhost。

    是的,JD 是绝对正确的。 hbase.master 现在是一个灭绝的财产。

    【讨论】:

    • 问题在于 regionserver/slave 试图连接到本地主机而不是远程主机。我的问题中包含了两个主机文件
    • 注释掉:172.25.38.245 localhost172.25.39.7 localhost
    • 它有效,但我不得不在从属主机文件中添加主机主机名-IP 的条目,因为我在主机名解析时遇到错误。尽管在 hbase-site.xml 等中指定了 IP,但仍然想知道为什么从站选择主站的主机名。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-06-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多