【问题标题】:Identify Reason for application shutdown in Kubernetes确定 Kubernetes 中应用程序关闭的原因
【发布时间】:2019-11-01 10:47:02
【问题描述】:

我有几个 .net Core 应用程序无缘无故关闭。自从实施健康检查以来似乎发生了这种情况,但我无法在 kubernetes 中看到杀死命令。

cmd

kubectl describe pod mypod

输出(重启次数这么高,因为每天晚上关机;舞台环境)

Name:               mypod
...
Status:             Running
...
Controlled By:      ReplicaSet/mypod-deployment-6dbb6bcb65
Containers:
  myservice:
    State:          Running
      Started:      Fri, 01 Nov 2019 09:59:40 +0100
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 01 Nov 2019 07:19:07 +0100
      Finished:     Fri, 01 Nov 2019 09:59:37 +0100
    Ready:          True
    Restart Count:  19
    Liveness:       http-get http://:80/liveness delay=10s timeout=1s period=5s #success=1 #failure=10
    Readiness:      http-get http://:80/hc delay=10s timeout=1s period=5s #success=1 #failure=10
    ...
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
...
Events:
  Type     Reason     Age                    From                               Message
  ----     ------     ----                   ----                               -------
  Warning  Unhealthy  18m (x103 over 3h29m)  kubelet, aks-agentpool-40946522-0  Readiness probe failed: Get http://10.244.0.146:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  18m (x29 over 122m)    kubelet, aks-agentpool-40946522-0  Liveness probe failed: Get http://10.244.0.146:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

这些是 pod 日志

cmd

kubectl logs mypod --previous

输出

Hosting environment: Production
Content root path: /app
Now listening on: http://[::]:80
Application started. Press Ctrl+C to shut down.
Application is shutting down...

来自 azure 的相应日志

cmd

kubectl get events

输出(我在这里缺少的是杀戮事件。我的假设是 pod 没有重新启动,这是由于多次失败的健康检查造成的)

LAST SEEN   TYPE      REASON                    OBJECT                                                      MESSAGE
39m         Normal    NodeHasSufficientDisk     node/aks-agentpool-40946522-0                               Node aks-agentpool-40946522-0 status is now: NodeHasSufficientDisk
39m         Normal    NodeHasSufficientMemory   node/aks-agentpool-40946522-0                               Node aks-agentpool-40946522-0 status is now: NodeHasSufficientMemory
39m         Normal    NodeHasNoDiskPressure     node/aks-agentpool-40946522-0                               Node aks-agentpool-40946522-0 status is now: NodeHasNoDiskPressure
39m         Normal    NodeReady                 node/aks-agentpool-40946522-0                               Node aks-agentpool-40946522-0 status is now: NodeReady
39m         Normal    CREATE                    ingress/my-ingress                                          Ingress default/ebizsuite-ingress
39m         Normal    CREATE                    ingress/my-ingress                                          Ingress default/ebizsuite-ingress
7m2s        Warning   Unhealthy                 pod/otherpod2                                               Readiness probe failed: Get http://10.244.0.158:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
7m1s        Warning   Unhealthy                 pod/otherpod2                                               Liveness probe failed: Get http://10.244.0.158:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
40m         Warning   Unhealthy                 pod/otherpod2                                               Liveness probe failed: Get http://10.244.0.158:80/liveness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
44m         Warning   Unhealthy                 pod/otherpod1                                               Liveness probe failed: Get http://10.244.0.151:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
5m35s       Warning   Unhealthy                 pod/otherpod1                                               Readiness probe failed: Get http://10.244.0.151:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
40m         Warning   Unhealthy                 pod/otherpod1                                               Readiness probe failed: Get http://10.244.0.151:80/hc: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
8m8s        Warning   Unhealthy                 pod/mypod                                                   Readiness probe failed: Get http://10.244.0.146:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
8m7s        Warning   Unhealthy                 pod/mypod                                                   Liveness probe failed: Get http://10.244.0.146:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s          Warning   Unhealthy                 pod/otherpod1                                               Readiness probe failed: Get http://10.244.0.151:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

来自另一个 pod 的卷曲(我每秒都在一个很长的循环中执行此操作,并且除了 200 OK 之外从未收到过其他信息)

kubectl exec -t otherpod1 -- curl --fail http://10.244.0.146:80/hc

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
{"status":"Healthy","totalDuration":"00:00:00.0647250","entries":{"self":{"data":{},"duration":"00:00:00.0000012","status":"Healthy"},"warmup":{"data":{},"duration":"00:00:00.0000007","status":"Healthy"},"TimeDB-check":{"data":{},"duration":"00:00:00.0341533","status":"Healthy"},"time-blob-storage-check":{"data":{},"duration":"00:00:00.0108192","status":"Healthy"},"time-rabbitmqbus-check":{"data":{},"duration":"00:00:00.0646841","status":"Healthy"}}}100   454    0   454    0     0   6579      0 --:--:-- --:--:-- --:--:--  6579

卷曲

kubectl exec -t otherpod1 -- curl --fail http://10.244.0.146:80/liveness

Healthy  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     7    0     7    0     0   7000      0 --:--:-- --:--:-- --:--:--  7000

【问题讨论】:

  • 您找到解决方案了吗?
  • 您找到解决方案了吗?

标签: c# asp.net docker asp.net-core kubernetes


【解决方案1】:

我认为你可以:

  1. 只修改livenessprobe和readinessprobe检查http://80,截掉URL中的路径

  2. 移除 livenessprobe 和 readinessprobe (enabled=false)

  3. 只需将延迟时间增加到 5 或 10 分钟,之后您可以 kubectl exec -it <pod-name> sh/bash 进入该 pod 并进行调试。您可以使用命令netstat 来检查您希望在端口 80 上启动的服务与否。最后一件事,您可以对 readinessprobe 或 livenessprobe curl -v http://localhost 执行相同的操作,如果此命令返回的代码不同 200,这就是您的 pod 总是重启的原因。

希望这对你有所帮助,伙计。

【讨论】:

    【解决方案2】:

    从日志看来,问题在于活性和就绪性探测器。这些都失败了,因此应用程序没有重新启动。

    移除探针并检查应用程序是否启动。进入 pod 并尝试检查 liveness 和 readiness 探针以调查它们失败的原因。

    【讨论】:

    • 健康检查有效。它只是不时地失败。我的问题是如何从我的 pod 中识别关机原因。因为它不会因健康检查失败而终止。请查看我对问题的最新更改。在那里你可以看到hc的卷曲
    • 尝试增加问题的超时值。
    猜你喜欢
    • 2015-01-18
    • 2012-03-26
    • 1970-01-01
    • 1970-01-01
    • 2022-01-08
    • 2015-01-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多