【问题标题】:kubernetes HA setup with kubeadm - fail to start scheduler and controller使用 kubeadm 设置 kubernetes HA - 无法启动调度程序和控制器
【发布时间】:2018-11-08 16:45:21
【问题描述】:

我尝试使用 kubeadm 构建 HA 集群,这是我的配置:

kind: MasterConfiguration
kubernetesVersion: v1.11.4
apiServerCertSANs:
- "aaa.xxx.yyy.zzz"
api:
    controlPlaneEndpoint: "my.domain.de:6443"
    apiServerExtraArgs:
      apiserver-count: 3
etcd:
  local:
    image: quay.io/coreos/etcd:v3.3.10
    extraArgs:
      listen-client-urls: "https://127.0.0.1:2379,https://$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4):2379"
      advertise-client-urls: "https://$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4):2379"
      listen-peer-urls: "https://$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4):2380"
      initial-advertise-peer-urls: "https://$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4):2380"
      initial-cluster-state: "new"
      initial-cluster-token: "kubernetes-cluster"
      initial-cluster: ${CLUSTER}
      name: $(hostname -s)
  localEtcd:
    serverCertSANs:
      - "$(hostname -s)"
      - "$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)"
    peerCertSANs:
      - "$(hostname -s)"
      - "$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)"
networking:
    podSubnet: "${POD_SUBNET}/${POD_SUBNETMASK}"
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: foobar.fedcba9876543210
  ttl: 24h0m0s
  usages:
  - signing
  - authentication

我在所有三个节点上运行此程序,并让节点启动。加入calico之后,好像一切都好,我还成功加了一个worker:

ubuntu@master-2-test2:~$ kubectl get nodes
NAME             STATUS    ROLES     AGE       VERSION
master-1-test2   Ready     master    1h        v1.11.4
master-2-test2   Ready     master    1h        v1.11.4
master-3-test2   Ready     master    1h        v1.11.4
node-1-test2     Ready     <none>    1h        v1.11.4

查看控制平面,一切正常。

curl https://192.168.0.125:6443/api/v1/nodes 在主节点和工作节点上工作。所有 pod 都在运行:

ubuntu@master-2-test2:~$ sudo kubectl get pods -n kube-system
NAME                                     READY     STATUS    RESTARTS   AGE
calico-node-9lnk8                        2/2       Running   0          1h
calico-node-f7dkk                        2/2       Running   1          1h
calico-node-k7hw5                        2/2       Running   17         1h
calico-node-rtrvb                        2/2       Running   3          1h
coredns-78fcdf6894-6xgqc                 1/1       Running   0          1h
coredns-78fcdf6894-kcm4f                 1/1       Running   0          1h
etcd-master-1-test2                      1/1       Running   0          1h
etcd-master-2-test2                      1/1       Running   1          1h
etcd-master-3-test2                      1/1       Running   0          1h
kube-apiserver-master-1-test2            1/1       Running   0          40m
kube-apiserver-master-2-test2            1/1       Running   0          58m
kube-apiserver-master-3-test2            1/1       Running   0          36m
kube-controller-manager-master-1-test2   1/1       Running   0          17m
kube-controller-manager-master-2-test2   1/1       Running   1          17m
kube-controller-manager-master-3-test2   1/1       Running   0          17m
kube-proxy-5clt4                         1/1       Running   0          1h
kube-proxy-d2tpz                         1/1       Running   0          1h
kube-proxy-q6kjw                         1/1       Running   0          1h
kube-proxy-vn6l7                         1/1       Running   0          1h
kube-scheduler-master-1-test2            1/1       Running   1          24m
kube-scheduler-master-2-test2            1/1       Running   0          24m
kube-scheduler-master-3-test2            1/1       Running   0          24m

但是尝试启动一个 pod,什么也没有发生:

~$ kubectl get deployments
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx     1         0         0            0           32m

我转而研究调度程序和控制器,令我沮丧的是,有很多错误,控制器泛滥:

E1108 00:40:36.638832       1 reflector.go:205] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: Unauthorized
E1108 00:40:36.639161       1 reflector.go:205] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: Unauthorized

有时还有:

 garbagecollector.go:649] failed to discover preferred resources: Unauthorized

E1108 00:40:36.639356       1 reflector.go:205] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: Unauthorized
E1108 00:40:36.640568       1 reflector.go:205] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: Unauthorized
E1108 00:40:36.642129       1 reflector.go:205] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: Unauthorized

而且调度器也有类似的错误:

E1107 23:25:43.026465       1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta1.ReplicaSet: Get https://mydomain.de:6443/apis/extensions/v1beta1/replicasets?limit=500&resourceVersion=0: EOF
E1107 23:25:43.026614       1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Node: Get https://mydomain.de:e:6443/api/v1/nodes?limit=500&resourceVersion=0: EOF

到目前为止,我不知道如何纠正这些错误。任何帮助将不胜感激。

更多信息:

kube-proxy 的 kubeconfig 是:

----
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    server: https://my.domain.de:6443
  name: default
contexts:
- context:
    cluster: default
    namespace: default
    user: default
  name: default
current-context: default
users:
- name: default
  user:
    tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
Events:  <none>

【问题讨论】:

    标签: kubernetes kubeadm kubernetes-ha


    【解决方案1】:

    对我来说看起来不错,但是,您能描述一下工作节点并查看可用于 pod 的资源吗?还要描述 pod 并查看它显示的错误。

    【讨论】:

    • 没有错误,因为 Pod 没有被调度。出于同样的原因,describe 不会显示任何内容。
    【解决方案2】:

    在与此端点上的活动kube-apiserver 交谈时出现一些身份验证问题(证书):https://mydomain.de:6443/apis/extensions/v1beta1/replicasets?limit=500&amp;resourceVersion=0

    一些提示:

    您的 kube-apiserver 的 load balancer 是否指向正确的?您使用的是 L4 (TCP) 负载平衡器而不是 L7 (HTTP) 负载平衡器吗?

    您是否在各处复制相同的证书并确保它们相同?

    USER=ubuntu # customizable
    CONTROL_PLANE_IPS="10.0.0.7 10.0.0.8"
    for host in ${CONTROL_PLANE_IPS}; do
        scp /etc/kubernetes/pki/ca.crt "${USER}"@$host:
        scp /etc/kubernetes/pki/ca.key "${USER}"@$host:
        scp /etc/kubernetes/pki/sa.key "${USER}"@$host:
        scp /etc/kubernetes/pki/sa.pub "${USER}"@$host:
        scp /etc/kubernetes/pki/front-proxy-ca.crt "${USER}"@$host:
        scp /etc/kubernetes/pki/front-proxy-ca.key "${USER}"@$host:
        scp /etc/kubernetes/pki/etcd/ca.crt "${USER}"@$host:etcd-ca.crt
        scp /etc/kubernetes/pki/etcd/ca.key "${USER}"@$host:etcd-ca.key
        scp /etc/kubernetes/admin.conf "${USER}"@$host:
    done
    

    您是否检查过kube-apiserverkube-controller-manager 配置在/etc/kubernetes/manifests 下是否等效?

    【讨论】:

    • 我正在使用 L4 TCP openstack 负载均衡器
    • 能否请您回答其余问题,证书一很重要。你有没有使用任何关于设置 HA 集群的特定教程?
    猜你喜欢
    • 1970-01-01
    • 2019-12-13
    • 1970-01-01
    • 2019-10-12
    • 1970-01-01
    • 2018-05-30
    • 2020-05-24
    • 2019-11-16
    • 1970-01-01
    相关资源
    最近更新 更多