【问题标题】:CoreDNS has problems getting Endpoints, Services, NamespacesCoreDNS 在获取端点、服务、命名空间时遇到问题
【发布时间】:2020-07-02 01:22:14
【问题描述】:

我对来自 master 的 CoreDNS 有以下问题(另请参阅 master 上的 ready is 0/1):

E0321 22:54:45.590231       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.528164       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.528164       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.528164       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.528164       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.531540       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.531540       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.531540       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.531540       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.591304       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.591304       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.591304       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.591304       1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused

其他一切似乎都在正常运行,我还可以从集群上的节点/pod 访问互联网

kube-system           coredns-776474d56-46fnz                        1/1     Running   0          2d23h   10.32.0.3       raspberrypi4-node     <none>           <none>
kube-system           coredns-776474d56-7nlw4                        0/1     Running   0          32h     10.36.0.1       raspberrypi4-master   <none>           <none>
kube-system           etcd-raspberrypi4-master                       1/1     Running   6          3d22h   192.168.0.192   raspberrypi4-master   <none>           <none>
kube-system           kube-apiserver-raspberrypi4-master             1/1     Running   4          3d22h   192.168.0.192   raspberrypi4-master   <none>           <none>
kube-system           kube-controller-manager-raspberrypi4-master    1/1     Running   9          3d22h   192.168.0.192   raspberrypi4-master   <none>           <none>
kube-system           kube-proxy-6vgm9                               1/1     Running   0          3d13h   192.168.0.157   raspberrypi3-node     <none>           <none>
kube-system           kube-proxy-vqqv7                               1/1     Running   5          3d22h   192.168.0.192   raspberrypi4-master   <none>           <none>
kube-system           kube-proxy-wj784                               1/1     Running   0          3d21h   192.168.0.90    raspberrypi4-node     <none>           <none>
kube-system           kube-scheduler-raspberrypi4-master             1/1     Running   9          3d22h   192.168.0.192   raspberrypi4-master   <none>           <none>
kube-system           weave-net-6db56                                2/2     Running   0          3d9h    192.168.0.90    raspberrypi4-node     <none>           <none>
kube-system           weave-net-7t7t6                                2/2     Running   0          3d9h    192.168.0.192   raspberrypi4-master   <none>           <none>
kube-system           weave-net-mg79s                                2/2     Running   0          3d9h    192.168.0.157   raspberrypi3-node     <none>           <none>

我检查了文档,有些端口没有打开,但这是对端口 443 的访问,这是一种系统特权端口,所以我想知道是否需要提供对 kubernetes 的访问该端口(并可能将其转发到 6443,在文档中是 Kubernetes API 服务器)。我还将从集群外部访问此端口,并希望 kubernetes 服务来处理它,并希望有一个简单的命令将 80 和 443 端口转发到该端口。

我刚刚注意到服务确实在侦听正确的 IP/端口,所以不知道它为什么拒绝连接。

$ kubectl get svc -A
NAMESPACE     NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP                  3d22h
kube-system   kube-dns     ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   3d22h

【问题讨论】:

  • 从 coredns pod 添加日志
  • 日志在最上面,都是一样的dial tcp 10.96.0.1:443: connect: connection refused

标签: kubernetes iptables raspberry-pi4 coredns


【解决方案1】:

问题出在 iptables 上。

  1. 确保在每个节点的 linux 内核上启用了 ip 转发。 执行命令: $ sysctl net.ipv4.conf.all.forwarding = 1

  2. 如果你的docker版本>=1.13,默认FORWARD链策略被删除,你必须将FORWARD链的默认策略设置为ACCEPT. 执行命令: $ sudo iptables -P FORWARD ACCEPT

  3. 最后使用标志 cluster-cidr:
    --cluster-cidr= 传递 kube-proxy 配置。

    --cluster-cidrflag 表示:

集群中 Pod 的 CIDR 范围。需要 --allocate-node-cidrs 是 真的。

如果未提供,则不会执行集群外桥接。
类似问题:kubernetes-coredns-issue.

如果有帮助请告诉我。

【讨论】:

  • 嘿,实际上起作用的是 iptables 技巧(第 2 次)仅与 sudo iptables --flushsudo iptables -tnat --flush 结合使用,而我在此之前运行了 kubelet 的重新启动(如该问题中所建议的那样)。跨度>
【解决方案2】:

接受的答案没有解决我的问题。如果有人有类似的问题,重启 coredns 解决了我的问题。

kubectl rollout restart deployment coredns --namespace kube-system

【讨论】:

  • 在运行 nodejs 服务器的 AWS EKS 上,由于某种 kube 系统 DNS 超时而出现 getaddrinfo EAI_AGAIN 错误。重新启动 coredns 解决了我的问题!
  • 在裸机集群中,这也不起作用。
猜你喜欢
  • 2023-04-01
  • 2011-05-09
  • 1970-01-01
  • 1970-01-01
  • 2017-03-07
  • 2016-06-14
  • 2014-10-13
  • 2020-10-20
  • 2012-02-06
相关资源
最近更新 更多