处理 http 服务器崩溃答案

【问题标题】：Handle http server crashes处理 http 服务器崩溃
【发布时间】：2014-09-21 08:46:59
【问题描述】：

我有一个非常基本的 http 服务器：

require("http").createServer(function (req, res) {
    res.end("Hello world!");                      
}).listen(8080);

如何监听服务器崩溃，以便发送500 状态码作为响应？

侦听process.on("uncaughtException", handler) 在process 级别工作，但我没有请求和响应对象。

我看到的一个可能的解决方案是在createServer 回调中使用try - catch 语句，但我正在寻找是否有更好的解决方案。

我尝试在 server 对象上侦听 error 事件，但没有任何反应：

var s = require("http").createServer(function (req, res) {
    undefined.foo; // test crash
    res.end("Hello world!");                      
});
s.on("error", function () { console.log(arguments); });
s.listen(8080);

【问题讨论】：

在有风险的部分或在外部函数中调用有风险的部分的行上使用 try/catch。
@dandavis 没有其他解决方案吗？
更多关于该主题的好读物：stackoverflow.com/questions/10390658/… 和 joyent.com/developers/node/design/errors，您还可以安装一个明确的默认错误处理程序。
您在正确的位置捕获异常（在每个请求中，您都有可用的信息来正确处理它）。这就是它的工作原理。问题结束。添加适当的错误处理。这里没有免费的午餐。哦，顺便说一句，您必须确保异步回调没有抛出异常，因为即使在请求级别的异常处理程序也不会捕获这些异常 - 您必须在回调中捕获这些异常。
可以考虑使用domain模块。

标签： javascript node.js http

【解决方案1】：

捕获并处理错误

您可以为此使用节点的内置domain module。

域提供了一种方法来处理多个不同的 IO 操作作为单组。如果任何事件发射器或回调注册到域发出错误事件，或抛出错误，然后域对象将被通知，而不是丢失错误的上下文在 process.on('uncaughtException') 处理程序中，或导致程序立即退出并显示错误代码。

需要注意的非常重要的一点是：

域错误处理程序不能替代发生错误时关闭您的进程。

由于 throw 在 JavaScript 中的工作原理，几乎有从来没有任何方法可以安全地“从你离开的地方继续”，而不会泄漏引用，或创建其他某种未定义的脆弱状态。

由于您只是询问如何响应500 错误，因此我不会像节点文档那样讨论如何处理重新启动服务器等； 我强烈推荐taking a look at the example in the node docs。他们的示例展示了如何捕获错误，将错误响应发送回客户端（如果可能），然后重新启动服务器。我将只显示域创建并发送回500 错误响应。 （请参阅下一节有关重新启动该过程的部分）

域的工作方式类似于将try/catch 放入createServer 回调中。在您的回调中：

创建一个新的域对象
收听域的error 事件
将req 和res 添加到域（因为它们是在域存在之前创建的）
run 域并调用您的请求处理程序（这就像 try/catch 的 try 部分）

类似这样的：

var domain = require('domain');

function handleRequest(req, res) {
    // Just something to trigger an async error
    setTimeout(function() {
        throw Error("Some random async error");
        res.end("Hello world!");  
    }, 100);
}

var server = require("http").createServer(function (req, res) {
    var d = domain.create();

    d.on('error', function(err) {
        // We're in an unstable state, so shutdown the server.
        // This will only stop new connections, not close existing ones.
        server.close();

        // Send our 500 error
        res.statusCode = 500;
        res.setHeader("content-type", "text/plain");
        res.end("Server error: " + err.message);
    });

    // Since the domain was created after req and res, they
    // need to be explictly added.
    d.add(req);
    d.add(res);

    // This is similar to a typical try/catch, but the "catch"
    // is now d's error event.
    d.run(function() {
        handleRequest(req, res);
    });
}).listen(8080);

出错后重启进程

通过使用cluster 模块，您可以在出错后很好地重新启动进程。我基本上是从此处的节点文档中复制示例，但总体思路是从主进程启动多个工作进程。工作人员是处理传入连接的进程。如果其中一个有不可恢复的错误（即我们在上一节中捕获的错误），那么它将与主进程断开连接，发送 500 响应并退出。当主进程看到工作进程断开连接时，它就会知道发生了错误并启动了一个新的工作进程。由于有多个工作进程同时运行，因此如果其中一个出现故障，应该不会出现丢失传入连接的问题。

示例代码，复制自here:

var cluster = require('cluster');
var PORT = +process.env.PORT || 1337;

if (cluster.isMaster) {
  // In real life, you'd probably use more than just 2 workers,
  // and perhaps not put the master and worker in the same file.
  //
  // You can also of course get a bit fancier about logging, and
  // implement whatever custom logic you need to prevent DoS
  // attacks and other bad behavior.
  //
  // See the options in the cluster documentation.
  //
  // The important thing is that the master does very little,
  // increasing our resilience to unexpected errors.

  cluster.fork();
  cluster.fork();

  cluster.on('disconnect', function(worker) {
    console.error('disconnect!');
    cluster.fork();
  });

} else {
  // the worker
  //
  // This is where we put our bugs!

  var domain = require('domain');

  // See the cluster documentation for more details about using
  // worker processes to serve requests.  How it works, caveats, etc.

  var server = require('http').createServer(function(req, res) {
    var d = domain.create();
    d.on('error', function(er) {
      console.error('error', er.stack);

      // Note: we're in dangerous territory!
      // By definition, something unexpected occurred,
      // which we probably didn't want.
      // Anything can happen now!  Be very careful!

      try {
        // make sure we close down within 30 seconds
        var killtimer = setTimeout(function() {
          process.exit(1);
        }, 30000);
        // But don't keep the process open just for that!
        killtimer.unref();

        // stop taking new requests.
        server.close();

        // Let the master know we're dead.  This will trigger a
        // 'disconnect' in the cluster master, and then it will fork
        // a new worker.
        cluster.worker.disconnect();

        // try to send an error to the request that triggered the problem
        res.statusCode = 500;
        res.setHeader('content-type', 'text/plain');
        res.end('Oops, there was a problem!\n');
      } catch (er2) {
        // oh well, not much we can do at this point.
        console.error('Error sending 500!', er2.stack);
      }
    });

    // Because req and res were created before this domain existed,
    // we need to explicitly add them.
    // See the explanation of implicit vs explicit binding below.
    d.add(req);
    d.add(res);

    // Now run the handler function in the domain.
    d.run(function() {
      handleRequest(req, res);
    });
  });
  server.listen(PORT);
}

// This part isn't important.  Just an example routing thing.
// You'd put your fancy application logic here.
function handleRequest(req, res) {
  switch(req.url) {
    case '/error':
      // We do some async stuff, and then...
      setTimeout(function() {
        // Whoops!
        flerb.bark();
      });
      break;
    default:
      res.end('ok');
  }
}

注意：我还是想强调一下，你应该看看domain module documentation，看看那里的例子和解释。它解释了其中大部分（如果不是全部）、其背后的原因以及您可能遇到的其他一些情况。

【讨论】：

这个答案中有一些有趣的内容，但是如果进程在使用超时时被终止。 他们的示例展示了如何捕获错误（即使它发生在异步函数中） - 只有当我删除超时（异步操作）时，才会出现 500 响应，否则进程会被杀死抛出的错误。有没有办法捕捉到这样的错误？想象一下，有一个很大的应用程序，在发出请求时会出现错误（foo.something，其中foo 是undefined）——当然，它们是错误，但是我们如何才能很好地处理这些异常呢？谢谢！
我不确定我是否明白你在问什么。我提供的示例代码应该返回 500 响应，即使错误发生在 setTimeout 中（您可以将 throw 替换为 undefined.foo() 或其他强制错误，它应该仍然有效）。请注意，它仍然会按照写入的方式终止进程，但应该首先发出 500 响应。
啊，我明白了。惊人的！但是，在不关闭服务器/进程的情况下，会出现什么问题？
您可以删除 server.close() 行，进程应该保持打开状态，但由于 throw 在 javascript 中的工作方式，进程不会很稳定.如果您看一下domain module's docs 中的示例，它显示了一种使用cluster 模块在发送500 响应后自动重启进程的方法。
我知道要重新启动该过程，因为我想要一些稳定的东西，很少会重新启动。想象一下，如果来自 GitHub 的人每 500 个请求就重新启动这些东西会发生什么...... :-)