【问题标题】:kube-proxy does not update iptableskube-proxy 不更新 iptables
【发布时间】:2019-02-28 20:59:01
【问题描述】:

我已经运行了 2 天的 k8s 集群,然后它开始出现奇怪的行为。

我的具体问题是关于 kube-proxy。 kube-proxy 没有更新 iptables。

从 kube-proxy 日志中,我可以看到它无法连接到 kubernetes-apiserver(在我的情况下,连接是 kube-prxy --> Haproxy --> k8s API 服务器)。但是 pod 显示为 RUNNING。

问题:如果 kube-proxy pod 无法向 apiserver 注册事件,我预计它会关闭。

如何通过 liveness probes 实现这种行为?

注意:杀死 pod 后,kube-proxy 工作正常。

kube-proxy 日志

sudo docker logs 1de375c94fd4 -f
W0910 15:18:22.091902       1 server.go:195] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
I0910 15:18:22.091962       1 feature_gate.go:226] feature gates: &{{} map[]}
time="2018-09-10T15:18:22Z" level=warning msg="Running modprobe ip_vs failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.15.0-33-generic/modules.dep.bin'\nmodprobe: WARNING: Module ip_vs not found in directory /lib/modules/4.15.0-33-generic`, error: exit status 1"
time="2018-09-10T15:18:22Z" level=error msg="Could not get ipvs family information from the kernel. It is possible that ipvs is not enabled in your kernel. Native loadbalancing will not work until this is fixed."
I0910 15:18:22.185086       1 server.go:409] Neither kubeconfig file nor master URL was specified. Falling back to in-cluster config.
I0910 15:18:22.186885       1 server_others.go:140] Using iptables Proxier.
W0910 15:18:22.438408       1 server.go:601] Failed to retrieve node info: nodes "$(node_name)" not found
W0910 15:18:22.438494       1 proxier.go:306] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
I0910 15:18:22.438595       1 server_others.go:174] Tearing down inactive rules.
I0910 15:18:22.861478       1 server.go:444] Version: v1.10.2
I0910 15:18:22.867003       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 2883584
I0910 15:18:22.867046       1 conntrack.go:52] Setting nf_conntrack_max to 2883584
I0910 15:18:22.867267       1 conntrack.go:83] Setting conntrack hashsize to 720896
I0910 15:18:22.893396       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0910 15:18:22.893505       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0910 15:18:22.893737       1 config.go:102] Starting endpoints config controller
I0910 15:18:22.893749       1 controller_utils.go:1019] Waiting for caches to sync for endpoints config controller
I0910 15:18:22.893742       1 config.go:202] Starting service config controller
I0910 15:18:22.893765       1 controller_utils.go:1019] Waiting for caches to sync for service config controller
I0910 15:18:22.993904       1 controller_utils.go:1026] Caches are synced for endpoints config controller
I0910 15:18:22.993921       1 controller_utils.go:1026] Caches are synced for service config controller
W0910 16:13:28.276082       1 reflector.go:341] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: watch of *core.Endpoints ended with: very short watch: k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Unexpected watch close - watch lasted less than a second and no items received
W0910 16:13:28.276083       1 reflector.go:341] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: watch of *core.Service ended with: very short watch: k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Unexpected watch close - watch lasted less than a second and no items received
E0910 16:13:29.276678       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:29.276677       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:30.277201       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:30.278009       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:31.277723       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:31.278574       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:32.278197       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:32.279134       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:33.278684       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:33.279587       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused

【问题讨论】:

  • 是否不止一次发生过?有没有停电或网络问题?

标签: kubernetes iptables kube-proxy


【解决方案1】:

问题:如果 kube-proxy pod 无法运行,我预计它会关闭 向 apiserver 注册事件。

kube-proxy 不应该宕机。它监听 kube-apiserver 上的事件,并在发生更改/部署时执行它需要做的任何事情。我能想到的基本原理是它可能是缓存信息以保持系统上的 iptables 一致。 Kubernetes 的设计方式是,如果您的 master/kube-apiserver/或 master 组件出现故障,那么流量应该仍会流向节点而不会停机。

如何通过 liveness probes 实现这种行为?

您始终可以向kube-proxy DaemonSet 添加活性探测,但这不是推荐的做法:

spec:
  containers:
  - command:
    - /usr/local/bin/kube-proxy
    - --config=/var/lib/kube-proxy/config.conf
    image: k8s.gcr.io/kube-proxy-amd64:v1.11.2
    imagePullPolicy: IfNotPresent
    name: kube-proxy
    resources: {}
    securityContext:
      privileged: true
    livenessProbe:
      exec:
        command:
          - curl <apiserver>:10256/healthz
      initialDelaySeconds: 5
      periodSeconds: 5

确保在 kube-apiserver 上启用了--healthz-port

【讨论】:

  • 我没有看到 kube-proxy 根据日志重试注册/监视事件。因此,该节点上不再有 iptables 更新。我的理解 kube-proxy 的唯一工作是注册事件,然后根据事件更新 iptables。由于注册/监视本身在 kube-proxy 的启动阶段失败,我预计 pod 会关闭。
猜你喜欢
  • 1970-01-01
  • 2015-05-22
  • 1970-01-01
  • 2019-07-21
  • 1970-01-01
  • 1970-01-01
  • 2016-03-04
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多