【发布时间】:2011-08-28 11:26:00
【问题描述】:
应该如何使用 heart 来保持应用程序的活力?
假设我有一个应用程序 X,如果我只是调用类似的东西,它会被监控:
erl -boot X -heart -env HEART_BEAT_TIMEOUT 30 -detached
?
【问题讨论】:
应该如何使用 heart 来保持应用程序的活力?
假设我有一个应用程序 X,如果我只是调用类似的东西,它会被监控:
erl -boot X -heart -env HEART_BEAT_TIMEOUT 30 -detached
?
【问题讨论】:
是的,这将自动启动监控您的节点的心脏进程。见the heart documentation。
更新:是的,渐近线是正确的。您还需要一个HEART_COMMAND 环境变量来指示重启节点时要做什么。
【讨论】:
http://www.erlang.org/doc/man/heart.html
This modules contains the interface to the heart process. heart
sends periodic heartbeats to an external port program, which is
also named `heart`. The purpose of the heart port program is to
check that the Erlang runtime system it is supervising is still
running. If the port program has not received any heartbeats within
`HEART_BEAT_TIMEOUT` seconds (default is 60 seconds), the system
can be rebooted. Also, if the system is equipped with a hardware
watchdog timer and is running Solaris, the watchdog can be used to
supervise the entire system.
<snip>
If the system should be rebooted because of missing heart-beats, or
a terminated Erlang runtime system, the environment variable
HEART_COMMAND has to be set before the system is started. If this
variable is not set, a warning text will be printed but the system
will not reboot.
现在我有一个Makefile,其中有一个语句,它将为我运行erl -heart ...。当我执行它时,这是进程列表:
ubuntu 3814 3579 3814 3579 0 22:03 pts/0 00:00:00 make webstart
ubuntu 3829 3814 3814 3579 25 22:03 pts/0 00:00:01 /usr/local/lib/erlang/erts-5.8.3/bin/beam.smp -K true -A 5
ubuntu 3848 3829 3848 3848 0 22:03 ? 00:00:00 heart -pid 3829
当我杀死 PID 3829 时,Erlang shell 中出现以下输出:
heart: Wed May 18 22:04:09 2011: Erlang has closed.
heart: Wed May 18 22:04:09 2011: Would reboot. Terminating.
make: *** [webstart] Terminated
很明显,我需要将 HEART_COMMAND 设置为某种重启语句,然后 heart 将按要求执行。 AFAIK,鉴于文档中的描述,heart 并不是要在崩溃时简单地重新启动 Erlang VM;这听起来像是 Erlang 主管应该为你做的事情,但我可能错了。
(当然,你可以只用 HEART_COMMAND 来重启你的 Erlang 程序)。
【讨论】:
The purpose of the heart port program is to check that the Erlang runtime system it is supervising is still running. If the port program has not received any heartbeats within HEART_BEAT_TIMEOUT seconds (default is 60 seconds), the system can be rebooted.