【发布时间】:2021-04-21 05:55:29
【问题描述】:
我在 CrashLoopBackOff 中搜索了 CoreDns,但没有任何帮助。
我的设置
k8s - v1.20.2
CoreDns-1.7.0
kubespray 用这个安装的https://kubernetes.io/ko/docs/setup/production-environment/tools/kubespray
我的问题
主节点上的 CoreDNS pod 处于运行状态 但在工作节点上,coreDns pod 处于 crashLoopBackOff 状态。
kubectl logs -f coredns-847f564ccf-msbvp -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 5b233a0166923d642fdbca0794b712ab
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s
CoreDns 容器运行命令“/coredns -conf /etc/resolv.conf”一段时间 然后它就被销毁了。
这里是核心文件
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf {
prefer_udp
}
cache 30
loop
reload
loadbalance
}
还有一个坠毁的 pod 事件
kubectl get event --namespace kube-system --field-selector involvedObject.name=coredns-847f564ccf-lqnxs
LAST SEEN TYPE REASON OBJECT MESSAGE
4m55s Warning Unhealthy pod/coredns-847f564ccf-lqnxs Liveness probe failed: Get "http://10.216.50.2:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
9m59s Warning BackOff pod/coredns-847f564ccf-lqnxs Back-off restarting failed container
这里是 CoreDns 描述
Containers:
coredns:
Container ID: docker://a174cb3a3800181d1c7b78831bfd37bbf69caf60a82051d6fb29b4b9deeacce9
Image: k8s.gcr.io/coredns:1.7.0
Image ID: docker-pullable://k8s.gcr.io/coredns@sha256:73ca82b4ce829766d4f1f10947c3a338888f876fbed0540dc849c89ff256e90c
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Wed, 21 Apr 2021 21:51:44 +0900
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 21 Apr 2021 21:44:42 +0900
Finished: Wed, 21 Apr 2021 21:46:32 +0900
Ready: False
Restart Count: 9943
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=0s timeout=5s period=10s #success=1 #failure=10
Readiness: http-get http://:8181/ready delay=0s timeout=5s period=10s #success=1 #failure=10
Environment: <none>
Mounts:
/etc/coredns from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-qqhn6 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 18m (x9940 over 30d) kubelet Container image "k8s.gcr.io/coredns:1.7.0" already present on machine
Warning Unhealthy 8m37s (x99113 over 30d) kubelet Liveness probe failed: Get "http://10.216.50.2:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning BackOff 3m35s (x121901 over 30d) kubelet Back-off restarting failed container
此时,任何建议都会有所帮助
我发现了一些奇怪的东西。 我在node1中测试,可以访问node2中的Coredns pod,但是无法访问node1中的Coredns pod。 我用印花布做 cni
在节点 1 中,coredns1 - 1.1.1.1 在 node2 中,coredns2 - 2.2.2.2
在节点 1 中。
- 访问 1.1.1.1:8080/health -> 超时
- 访问 2.2.2.2:8080/health -> 好的
在节点 2 中。
- 访问 1.1.1.1:8080/health -> 好的
- 访问 2.2.2.2:8080/health -> 超时
【问题讨论】:
-
CoreDNS 接收到 SIGTERM 听起来像是因为探测失败而被杀死。您能否尝试描述其中一个 Pod,查看事件以检查探测是否失败?
-
活性探测失败。但是当 CoreDns 容器运行命令“/coredns -conf /etc/resolv.conf”时 curl 10.216.50.2:8080/health 没问题
-
您好@JovialCoding,欢迎来到 StackOverflow!您能否通过编辑问题向我们展示完整的
livenessProbe配置? -
@WytrzymałyWiktor 感谢您的热情问候。我编辑了问题并将 Coredns 描述放在下面
-
嗨@WytrzymałyWiktor 感谢您的评论,我会尝试并留下结果。
标签: kubernetes coredns