工作节点上 CrashLoopBackOff 中的 Kubernetes CoreDNS答案

【问题标题】：Kubernetes CoreDNS in CrashLoopBackOff on worker node工作节点上 CrashLoopBackOff 中的 Kubernetes CoreDNS
【发布时间】：2021-04-21 05:55:29
【问题描述】：

我在 CrashLoopBackOff 中搜索了 CoreDns，但没有任何帮助。

我的设置

k8s - v1.20.2
CoreDns-1.7.0
kubespray 用这个安装的https://kubernetes.io/ko/docs/setup/production-environment/tools/kubespray

我的问题

主节点上的 CoreDNS pod 处于运行状态但在工作节点上，coreDns pod 处于 crashLoopBackOff 状态。

enter image description here

kubectl logs -f coredns-847f564ccf-msbvp -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 5b233a0166923d642fdbca0794b712ab
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s

CoreDns 容器运行命令“/coredns -conf /etc/resolv.conf”一段时间然后它就被销毁了。

enter image description here

这里是核心文件

Corefile: |
    .:53 {
        errors
        health {
            lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf {
          prefer_udp
        }
        cache 30
        loop
        reload
        loadbalance
    }

还有一个坠毁的 pod 事件

kubectl get event --namespace kube-system --field-selector involvedObject.name=coredns-847f564ccf-lqnxs
LAST SEEN   TYPE      REASON      OBJECT                         MESSAGE
4m55s       Warning   Unhealthy   pod/coredns-847f564ccf-lqnxs   Liveness probe failed: Get "http://10.216.50.2:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
9m59s       Warning   BackOff     pod/coredns-847f564ccf-lqnxs   Back-off restarting failed container

这里是 CoreDns 描述

Containers:
  coredns:
    Container ID:  docker://a174cb3a3800181d1c7b78831bfd37bbf69caf60a82051d6fb29b4b9deeacce9
    Image:         k8s.gcr.io/coredns:1.7.0
    Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:73ca82b4ce829766d4f1f10947c3a338888f876fbed0540dc849c89ff256e90c
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Wed, 21 Apr 2021 21:51:44 +0900
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 21 Apr 2021 21:44:42 +0900
      Finished:     Wed, 21 Apr 2021 21:46:32 +0900
    Ready:          False
    Restart Count:  9943
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=0s timeout=5s period=10s #success=1 #failure=10
    Readiness:    http-get http://:8181/ready delay=0s timeout=5s period=10s #success=1 #failure=10
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-qqhn6 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node-role.kubernetes.io/control-plane:NoSchedule
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                       From     Message
  ----     ------     ----                      ----     -------
  Normal   Pulled     18m (x9940 over 30d)      kubelet  Container image "k8s.gcr.io/coredns:1.7.0" already present on machine
  Warning  Unhealthy  8m37s (x99113 over 30d)   kubelet  Liveness probe failed: Get "http://10.216.50.2:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    3m35s (x121901 over 30d)  kubelet  Back-off restarting failed container

此时，任何建议都会有所帮助

我发现了一些奇怪的东西。我在node1中测试，可以访问node2中的Coredns pod，但是无法访问node1中的Coredns pod。我用印花布做 cni

在节点 1 中，coredns1 - 1.1.1.1 在 node2 中，coredns2 - 2.2.2.2

在节点 1 中。

访问 1.1.1.1:8080/health -> 超时
访问 2.2.2.2:8080/health -> 好的

在节点 2 中。

访问 1.1.1.1:8080/health -> 好的
访问 2.2.2.2:8080/health -> 超时

【问题讨论】：

CoreDNS 接收到 SIGTERM 听起来像是因为探测失败而被杀死。您能否尝试描述其中一个 Pod，查看事件以检查探测是否失败？
活性探测失败。但是当 CoreDns 容器运行命令“/coredns -conf /etc/resolv.conf”时 curl 10.216.50.2:8080/health 没问题
您好@JovialCoding，欢迎来到 StackOverflow！您能否通过编辑问题向我们展示完整的 livenessProbe 配置？
@WytrzymałyWiktor 感谢您的热情问候。我编辑了问题并将 Coredns 描述放在下面
嗨@WytrzymałyWiktor 感谢您的评论，我会尝试并留下结果。

标签： kubernetes coredns

【解决方案1】：

如果 Containerd 和 Kubelet 在 Proxy 下，请将私有 IP 范围：10.0.0.0/8 添加到 NO_PROXY 配置中，以确保它们可以拉取镜像。例如：

[root@dev-systrdemo301z phananhtuan01]# cat /etc/systemd/system/containerd.service.d/proxy.conf
[Service]
Environment="HTTP_PROXY=dev-proxy.prod.xx.local:8300"
Environment="HTTPS_PROXY=dev-proxy.prod.xx.local:8300"
Environment="NO_PROXY=localhost,127.0.0.0/8,100.67.253.157/24,10.0.0.0/8"

请参考this article。

【讨论】：