【发布时间】:2018-10-05 14:19:15
【问题描述】:
我正在编写一个负载很重的事件收集器 http 服务器。因此,在 http 处理程序中,我只是反序列化事件,然后在 goroutine 中的 http 请求-响应周期之外运行实际处理。
有了这个,我发现如果我以每秒 400 个请求的速度访问服务器,那么对于 99 个百分位,延迟低于 20 毫秒。但是,一旦我将请求率提高到每秒 500 次,延迟就会飙升至超过 800 毫秒。
任何人都可以帮助我了解原因可能是什么,以便我可以探索更多。
package controller
import (
"net/http"
"encoding/json"
"event-server/service"
"time"
)
func CollectEvent() http.Handler {
handleFunc := func(w http.ResponseWriter, r *http.Request) {
startTime := time.Now()
stats.Incr("TotalHttpRequests", nil, 1)
decoder := json.NewDecoder(r.Body)
var event service.Event
err := decoder.Decode(&event)
if err != nil {
http.Error(w, "Invalid json: " + err.Error(), http.StatusBadRequest)
return
}
go service.Collect(&event)
w.Write([]byte("Accepted"))
stats.Timing("HttpResponseDuration", time.Since(startTime), nil, 1)
}
return http.HandlerFunc(handleFunc)
}
我以每秒 1000 个请求运行测试并对其进行了分析。以下是结果。
(pprof) top20
Showing nodes accounting for 3.97s, 90.85% of 4.37s total
Dropped 89 nodes (cum <= 0.02s)
Showing top 20 nodes out of 162
flat flat% sum% cum cum%
0.72s 16.48% 16.48% 0.72s 16.48% runtime.mach_semaphore_signal
0.65s 14.87% 31.35% 0.66s 15.10% syscall.Syscall
0.54s 12.36% 43.71% 0.54s 12.36% runtime.usleep
0.46s 10.53% 54.23% 0.46s 10.53% runtime.cgocall
0.34s 7.78% 62.01% 0.34s 7.78% runtime.mach_semaphore_wait
0.33s 7.55% 69.57% 0.33s 7.55% runtime.kevent
0.30s 6.86% 76.43% 0.30s 6.86% syscall.RawSyscall
0.10s 2.29% 78.72% 0.10s 2.29% runtime.mach_semaphore_timedwait
0.07s 1.60% 80.32% 1.25s 28.60% net.dialSingle
0.06s 1.37% 81.69% 0.11s 2.52% runtime.notetsleep
0.06s 1.37% 83.07% 0.06s 1.37% runtime.scanobject
0.06s 1.37% 84.44% 0.06s 1.37% syscall.Syscall6
0.05s 1.14% 85.58% 0.05s 1.14% internal/poll.convertErr
0.05s 1.14% 86.73% 0.05s 1.14% runtime.memmove
0.05s 1.14% 87.87% 0.05s 1.14% runtime.step
0.04s 0.92% 88.79% 0.09s 2.06% runtime.mallocgc
0.03s 0.69% 89.47% 0.58s 13.27% net.(*netFD).connect
0.02s 0.46% 89.93% 0.40s 9.15% net.sysSocket
0.02s 0.46% 90.39% 0.03s 0.69% net/http.(*Transport).getIdleConn
0.02s 0.46% 90.85% 0.13s 2.97% runtime.gentraceback
(pprof) top --cum
Showing nodes accounting for 70ms, 1.60% of 4370ms total
Dropped 89 nodes (cum <= 21.85ms)
Showing top 10 nodes out of 162
flat flat% sum% cum cum%
0 0% 0% 1320ms 30.21% net/http.(*Transport).getConn.func4
0 0% 0% 1310ms 29.98% net.(*Dialer).Dial
0 0% 0% 1310ms 29.98% net.(*Dialer).Dial-fm
0 0% 0% 1310ms 29.98% net.(*Dialer).DialContext
0 0% 0% 1310ms 29.98% net/http.(*Transport).dial
0 0% 0% 1310ms 29.98% net/http.(*Transport).dialConn
0 0% 0% 1250ms 28.60% net.dialSerial
70ms 1.60% 1.60% 1250ms 28.60% net.dialSingle
0 0% 1.60% 1170ms 26.77% net.dialTCP
0 0% 1.60% 1170ms 26.77% net.doDialTCP
(pprof)
【问题讨论】:
-
显示的代码中没有任何内容表明为什么性能会在 400 或 500 个请求/秒时下降。问题出在您未显示的代码的其他部分。 Profile your code.
-
代码的其他部分针对 json 模式对事件数据进行 json 验证,然后将其推送到流中。
-
这不是什么你的代码的其余部分,而是如何。没有看到它,我们将无法为您提供帮助。如前所述,请分析您的代码。
-
附带说明,处理每个请求已经在单独的 goroutine 中完成,因此您从启动器中没有任何收获,但每个请求都有另一个 goroutine。这只是开销。
-
个人资料和基准测试,然后在此处发布实际数据,否则您只是要求人们为您猜测。分析将允许您自己识别代码中的慢速路径。
标签: performance http go server latency