【发布时间】:2012-12-02 16:54:55
【问题描述】:
我正在对 nginx 进行压力测试,并使用 nodejs 后端。我发现keepalive有延迟。我从测试中删除了 nginx,我也遇到了同样的问题。
我正在使用:
- ApacheBench,2.3 版
- 节点 v0.8.14。
- Ubuntu 12.04.1 LTS
- Express 3.0.3
源码为:
var express = require('express');
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
} else {
var buffer = new Buffer(1048576);
buffer.fill("a");
var app = express();
app.listen(8080);
app.get('/test', function(req, res){
setTimeout(function () {
res.set('Content-Type', 'text/html');
res.send(buffer.slice(0, req.query.size))
}, req.query.delay);
});
}
没有keepalive的tcpdump示例:ab -c 1 -n 10 -r "172.16.76.253:8080/test?size=1024&delay=100"
10:58:59.403876 IP 172.16.180.47.57380 > 172.16.76.253.http-alt: Flags [P.], seq 1:122, ack 1, win 15, options [nop,nop,TS val 65479762 ecr 362284218], length 121
10:58:59.403961 IP 172.16.76.253.http-alt > 172.16.180.47.57380: Flags [.], ack 122, win 29, options [nop,nop,TS val 362284218 ecr 65479762], length 0
10:58:59.504631 IP 172.16.76.253.http-alt > 172.16.180.47.57380: Flags [P.], seq 1:146, ack 122, win 29, options [nop,nop,TS val 362284243 ecr 65479762], length 145
10:58:59.504890 IP 172.16.76.253.http-alt > 172.16.180.47.57380: Flags [FP.], seq 146:1170, ack 122, win 29, options [nop,nop,TS val 362284243 ecr 65479762], length 1024
10:58:59.505727 IP 172.16.180.47.57380 > 172.16.76.253.http-alt: Flags [.], ack 146, win 17, options [nop,nop,TS val 65479787 ecr 362284243], length 0
10:58:59.505741 IP 172.16.180.47.57380 > 172.16.76.253.http-alt: Flags [.], ack 1171, win 21, options [nop,nop,TS val 65479787 ecr 362284243], length 0
ab 结果:
Server Hostname: 172.16.76.253
Server Port: 8080
Document Path: /test?size=1024&delay=100
Document Length: 1024 bytes
Concurrency Level: 1
Time taken for tests: 1.025 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 11690 bytes
HTML transferred: 10240 bytes
Requests per second: 9.75 [#/sec] (mean)
Time per request: 102.530 [ms] (mean)
Time per request: 102.530 [ms] (mean, across all concurrent requests)
Transfer rate: 11.13 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 1 0.1 1 1
Processing: 101 102 0.5 102 102
Waiting: 101 102 0.5 102 102
Total: 102 102 0.6 103 103
WARNING: The median and mean for the total time are not within a normal deviation
These results are probably not that reliable.
Percentage of the requests served within a certain time (ms)
50% 103
66% 103
75% 103
80% 103
90% 103
95% 103
98% 103
99% 103
100% 103 (longest request)
keepalive 的 tcpdump 示例ab -c 1 -n 10 -k -r "172.16.76.253:8080/test?size=1024&delay=100"
11:00:12.567741 IP 172.16.180.47.57385 > 172.16.76.253.http-alt: Flags [P.], seq 1306:1451, ack 10567, win 26, options [nop,nop,TS val 65498053 ecr 362302509], length 145
11:00:12.567761 IP 172.16.76.253.http-alt > 172.16.180.47.57385: Flags [.], ack 1451, win 50, options [nop,nop,TS val 362302509 ecr 65498053], length 0
11:00:12.668837 IP 172.16.76.253.http-alt > 172.16.180.47.57385: Flags [P.], seq 10567:10717, ack 1451, win 50, options [nop,nop,TS val 362302534 ecr 65498053], length 150
11:00:12.706745 IP 172.16.180.47.57385 > 172.16.76.253.http-alt: Flags [.], ack 10717, win 26, options [nop,nop,TS val 65498088 ecr 362302534], length 0
11:00:12.706765 IP 172.16.76.253.http-alt > 172.16.180.47.57385: Flags [P.], seq 10717:11741, ack 1451, win 50, options [nop,nop,TS val 362302544 ecr 65498088], length 1024
11:00:12.707901 IP 172.16.180.47.57385 > 172.16.76.253.http-alt: Flags [F.], seq 1451, ack 11741, win 26, options [nop,nop,TS val 65498088 ecr 362302544], length 0
11:00:12.708141 IP 172.16.76.253.http-alt > 172.16.180.47.57385: Flags [F.], seq 11741, ack 1452, win 50, options [nop,nop,TS val 362302544 ecr 65498088], length 0
ab 结果:
Server Hostname: 172.16.76.253
Server Port: 8080
Document Path: /test?size=1024&delay=100
Document Length: 1024 bytes
Concurrency Level: 1
Time taken for tests: 1.361 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Keep-Alive requests: 10
Total transferred: 11740 bytes
HTML transferred: 10240 bytes
Requests per second: 7.35 [#/sec] (mean)
Time per request: 136.073 [ms] (mean)
Time per request: 136.073 [ms] (mean, across all concurrent requests)
Transfer rate: 8.43 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 1
Processing: 103 136 11.6 140 140
Waiting: 100 101 0.7 101 102
Total: 104 136 11.3 140 140
Percentage of the requests served within a certain time (ms)
50% 140
66% 140
75% 140
80% 140
90% 140
95% 140
98% 140
99% 140
100% 140 (longest request)
nodejs分两个包发送响应,一个带header,一个带data,第一个包后等待ack。
我尝试设置 sysctl net.ipv4.tcp_slow_start_after_idle=0 && sysctl net.ipv4.route.flush=1,但没有任何效果。
使用 keepalive 时会有 40 毫秒的额外延迟。问题是:额外的 40ms 是什么时候来的?也许我做错了?
【问题讨论】:
-
由于样本量很小,您的统计数据基本上毫无意义。您需要数百个甚至数千个数据点。目前尚不清楚您的实际问题是什么。
-
为什么要硬编码 CPU 的数量?可能与问题无关,但您可能应该坚持现实世界的场景:var numCPUs = require('os').cpus().length;
-
@EJP 每秒 8000 个请求的测试给出了相同的结果。当使用 keepalive 时,会有 40 毫秒的额外延迟。问题是:额外的 40ms 是什么时候来的?
-
@KevinReilly 它是硬编码的,因为它只是一个测试,并且因为我想让一个 CPU 空闲给其他进程
-
最有可能的是,额外的 40 毫秒是断开连接所需的时间。使用keepalive,客户端在收到响应时最终启动连接拆除。没有它,服务器在发送响应时会启动拆卸。响应在收到之前发送,因此在 keepalive 情况下还有额外的时间。如果每个连接只发送一个请求,保持活动状态会有点伤害你。
标签: node.js http tcp express tcpdump