【问题标题】:Installation failed. Failed to receive heartbeat from agent安装失败。无法从代理接收心跳
【发布时间】:2014-08-04 10:55:49
【问题描述】:

我收到了这个错误

安装失败。无法从代理接收心跳。

当我在单个节点上安装 cloudera 时。 这是我的/etc/hosts 文件中的内容:

127.0.0.1   localhost
192.168.2.131   ubuntu

这是我的/etc/hostname 文件中的内容:

ubuntu

这是我的/var/log/cloudera-scm-agent 文件中的错误:

[13/Jun/2014 12:31:58 +0000] 15366 MainThread agent        INFO     To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels.
[13/Jun/2014 12:31:58 +0000] 15366 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/process
[13/Jun/2014 12:31:58 +0000] 15366 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor
[13/Jun/2014 12:31:58 +0000] 15366 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor/include
[13/Jun/2014 12:31:58 +0000] 15366 MainThread agent        ERROR    Failed to connect to previous supervisor.
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/src/cmf/agent.py", line 1236, in find_or_start_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/src/cmf/agent.py", line 1423, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1224, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1578, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 958, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 776, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 757, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[13/Jun/2014 12:31:58 +0000] 15366 MainThread tmpfs        INFO     Reusing mounted tmpfs at /run/cloudera-scm-agent/process
[13/Jun/2014 12:31:59 +0000] 15366 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 1)
[13/Jun/2014 12:31:59 +0000] 15366 MainThread agent        INFO     Successfully connected to supervisor
[13/Jun/2014 12:31:59 +0000] 15366 MainThread _cplogging   INFO     [13/Jun/2014:12:31:59] ENGINE Bus STARTING
[13/Jun/2014 12:31:59 +0000] 15366 MainThread _cplogging   INFO     [13/Jun/2014:12:31:59] ENGINE Started monitor thread '_TimeoutMonitor'.
[13/Jun/2014 12:31:59 +0000] 15366 MainThread _cplogging   INFO     [13/Jun/2014:12:31:59] ENGINE Serving on ubuntu:9000
[13/Jun/2014 12:31:59 +0000] 15366 MainThread _cplogging   INFO     [13/Jun/2014:12:31:59] ENGINE Bus STARTED
[13/Jun/2014 12:31:59 +0000] 15366 MainThread __init__     INFO     New monitor: (<cmf.monitor.host.HostMonitor object at 0x305b990>,)
[13/Jun/2014 12:31:59 +0000] 15366 MainThread agent        WARNING  Setting default socket timeout to 30!
[13/Jun/2014 12:31:59 +0000] 15366 MonitorDaemon-Scheduler __init__     INFO     Monitor ready to report: ('HostMonitor',)
[13/Jun/2014 12:31:59 +0000] 15366 MainThread agent        INFO     Using parcels directory from server provided value: /opt/cloudera/parcels
[13/Jun/2014 12:31:59 +0000] 15366 MainThread parcel       INFO     Agent does create users/groups and apply file permissions
[13/Jun/2014 12:31:59 +0000] 15366 MainThread downloader   INFO     Downloader path: /opt/cloudera/parcel-cache
[13/Jun/2014 12:31:59 +0000] 15366 MainThread parcel_cache INFO     Using /opt/cloudera/parcel-cache for parcel cache
[13/Jun/2014 12:31:59 +0000] 15366 MainThread agent        INFO     Active parcel list updated; recalculating component info.
[13/Jun/2014 12:32:04 +0000] 15366 Monitor-HostMonitor throttling_logger INFO     Using java location: '/usr/lib/jvm/java-7-oracle-cloudera/bin/java'.
[13/Jun/2014 12:32:04 +0000] 15366 Monitor-HostMonitor throttling_logger ERROR    Failed to collect NTP metrics
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/src/cmf/monitor/host/ntp_monitor.py", line 39, in collect
    result, stdout, stderr = self._subprocess_with_timeout(args, self._timeout)
  File "/usr/lib/cmf/agent/src/cmf/monitor/host/ntp_monitor.py", line 32, in _subprocess_with_timeout
    return subprocess_with_timeout(args, timeout)
  File "/usr/lib/cmf/agent/src/cmf/monitor/host/subprocess_timeout.py", line 40, in subprocess_with_timeout
    close_fds=True)
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
[13/Jun/2014 12:32:12 +0000] 15366 Monitor-HostMonitor throttling_logger ERROR    Timeout with args ['/usr/lib/jvm/java-7-oracle-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.0.2.jar', 'com.cloudera.cmon.agent.DnsTest']
None
[13/Jun/2014 12:32:12 +0000] 15366 Monitor-HostMonitor throttling_logger ERROR    Failed to collect java-based DNS names
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/src/cmf/monitor/host/dns_names.py", line 67, in collect
    result, stdout, stderr = self._subprocess_with_timeout(args, self._poll_timeout)
  File "/usr/lib/cmf/agent/src/cmf/monitor/host/dns_names.py", line 49, in _subprocess_with_timeout
    return subprocess_with_timeout(args, timeout)
  File "/usr/lib/cmf/agent/src/cmf/monitor/host/subprocess_timeout.py", line 81, in subprocess_with_timeout
    raise Exception("timeout with args %s" % args)
Exception: timeout with args ['/usr/lib/jvm/java-7-oracle-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.0.2.jar', 'com.cloudera.cmon.agent.DnsTest']

【问题讨论】:

    标签: cloudera


    【解决方案1】:

    遇到同样的错误,请确保您的主机名可以转换为您的 ip。 运行 ifconfig -a 查找 eth0 的 IP 地址,然后使用 FQDN 运行 dig 或 host 命令并查看 IP 地址是否与 ifconfig 显示的相同。

    按照 cloudera 的这个教程:http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_11_1.html

    【讨论】:

      【解决方案2】:

      确保主机的主机名配置正确。 确保在 Cloudera Manager 服务器上可以访问端口 7182(检查防火墙规则)。 确保要添加的主机上的端口 9000 和 9001 是空闲的。 检查正在添加的主机上 /var/log/cloudera-scm-agent/ 中的代理日志(部分日志可以在安装详细信息中找到)。

      【讨论】:

        【解决方案3】:

        我也面临着类似的问题。我找到了解决问题的方法:

        ERROR    Failed to collect NTP metrics
        

        这是因为 NTP 服务没有安装/启动。 试试:

        sudo apt-get update && sudo apt-get install ntp
        sudo service ntp start
        

        【讨论】:

          【解决方案4】:

          在 AWS 上安装 Cloudera 5.2 时,会出现此错误。这是一个已知问题,Cloudera 在他们的网站上放了the workaround(复制到这里):

          在 AWS 上安装时,您必须使用私有 EC2 主机名。 在 AWS 实例上安装并使用其公有名称添加主机时,如果主机心跳失败,安装将失败。

          解决方法: 使用向导中的“后退”按钮返回到原始屏幕,提示输入许可证。

          重新运行向导,但选择“使用现有主机”而不是搜索主机。现在这些主机会显示它们的内部 EC2 名称。

          继续完成向导,安装应该会成功。

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2014-12-05
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2014-06-22
            相关资源
            最近更新 更多