【问题标题】:Kubernetes cluster does not run after rebootKubernetes 集群重启后不运行
【发布时间】:2020-01-08 23:48:28
【问题描述】:

如果我在重启后使用 kubectl 命令,我会收到错误消息。 x.x.x.x: 6443 被拒绝-您指定了正确的主机或端口吗?

如果我用 docker ps 检查我的容器,kube-apiserver 和 kube-scheduler 会打开和关闭。

为什么会这样?

root@taeil-linux:/etc/systemd/system/kubelet.service.d# cd
root@taeil-linux:~# kubectl get nodes
The connection to the server 10.0.0.152:6443 was refused - did you     specify the right host or port?
root@taeil-linux:~# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED                 STATUS              PORTS               NAMES
root@taeil-linux:~# docker images
REPOSITORY                           TAG                 IMAGE ID                CREATED             SIZE
k8s.gcr.io/kube-proxy                v1.15.3                 232b5c793146        2 weeks ago         82.4MB
k8s.gcr.io/kube-apiserver            v1.15.3                 5eb2d3fc7a44        2 weeks ago         207MB
k8s.gcr.io/kube-scheduler            v1.15.3                 703f9c69a5d5        2 weeks ago         81.1MB
k8s.gcr.io/kube-controller-manager   v1.15.3                 e77c31de5547        2 weeks ago         159MB
node                                 carbon                  c83f74dcf58e        3 weeks ago         895MB
kubernetesui/dashboard               v2.0.0-beta1            4640949a39e6        2 months ago        64.6MB
weaveworks/weave-kube                2.5.2                   f04a043bb67a        3 months ago        148MB
weaveworks/weave-npc                 2.5.2                   5ce48e0d813c        3 months ago        49.6MB
kubernetesui/metrics-scraper         v1.0.0                  44390ebe2b73        4 months ago        36.8MB
k8s.gcr.io/coredns                   1.3.1                   eb516548c180        7 months ago        40.3MB
k8s.gcr.io/etcd                      3.3.10                  2c4adeb21b4f        9 months ago        258MB
quay.io/coreos/flannel               v0.10.0-amd64           f0fad859c909        19 months ago       44.6MB
k8s.gcr.io/pause                     3.1                     da86e6ba6ca1        20 months ago       742kB

root@taeil-linux:~# systemctl status kubelet

● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled;     vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Fri 2019-09-06 14:29:25 KST; 4min     19s ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 14470 (kubelet)
    Tasks: 19 (limit: 4512)
   CGroup: /system.slice/kubelet.service
           └─14470 /usr/bin/kubelet --bootstrap-    kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --    kubeconfig=/etc/kubernetes/kubelet.conf --    config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-    plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --resolv-con

 9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.800330       14470 pod_workers.go:190] Error syncing pod     9a745ac0a776afabd0d387fd0fcb2f54 ("kube-apiserver-taeil-linux_kube-    system(9a745ac0a776afabd0d387fd0fcb2f54)"), skipping: failed to     "CreatePodSandbox" for "kube-apiserver-ta
 9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.897945       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.916566       14470 reflector.go:125]     k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list     *v1.Pod: Get https://10.0.0.152:6443/api/v1/pods?    fieldSelector=spec.nodeName%3Dtaeil-linux&limit=500&resourceVersion=0:     dia
 9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.998190       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.098439       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.198732       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.299052       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.399343       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.499561       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.599723       14470 kubelet.go:2248] node "taeil-linux" not found

root@taeil-linux:~# systemctl status kube-apiserver

Unit kube-apiserver.service could not be found.

如果我尝试 码头工人日志

Flag --insecure-port has been deprecated, This flag will be removed in     a future version.
I0906 10:54:19.636649       1 server.go:560] external host was not     specified, using 10.0.0.152
I0906 10:54:19.636954       1 server.go:147] Version: v1.15.3
I0906 10:54:21.753962       1 plugins.go:158] Loaded 10 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,MutatingAdmissionWebhook.
I0906 10:54:21.753988       1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
E0906 10:54:21.754660       1 prometheus.go:55] failed to register     depth metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.754701       1 prometheus.go:68] failed to register     adds metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.754787       1 prometheus.go:82] failed to register     latency metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.754842       1 prometheus.go:96] failed to register workDuration metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754883       1 prometheus.go:112] failed to register     unfinished metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.754918       1 prometheus.go:126] failed to register     unfinished metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.754952       1 prometheus.go:152] failed to register     depth metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.754986       1 prometheus.go:164] failed to register     adds metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.755047       1 prometheus.go:176] failed to register     latency metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.755104       1 prometheus.go:188] failed to register     work_duration metric admission_quota_controller: duplicate metrics     collector registration attempted
E0906 10:54:21.755152       1 prometheus.go:203] failed to register     unfinished_work_seconds metric admission_quota_controller: duplicate     metrics collector registration attempted
E0906 10:54:21.755188       1 prometheus.go:216] failed to register     longest_running_processor_microseconds metric admission_quota_controller:     duplicate metrics collector registration attempted
I0906 10:54:21.755215       1 plugins.go:158] Loaded 10 mutating     admission controller(s) successfully in the following order:     NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesBy    Condition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObj    ectInUseProtection,MutatingAdmissionWebhook.
I0906 10:54:21.755226       1 plugins.go:161] Loaded 6 validating     admission controller(s) successfully in the following order:     LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,Validating    AdmissionWebhook,ResourceQuota.
I0906 10:54:21.757263       1 client.go:354] parsed scheme: ""
I0906 10:54:21.757280       1 client.go:354] scheme "" not registered,     fallback to default scheme
I0906 10:54:21.757335       1 asm_amd64.s:1337] ccResolverWrapper:     sending new addresses to cc: [{127.0.0.1:2379 0  <nil>}]
I0906 10:54:21.757402       1 asm_amd64.s:1337] balancerWrapper: got     update addr from Notify: [{127.0.0.1:2379 <nil>}]
W0906 10:54:21.757666       1 clientconn.go:1251] grpc:     addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}.     Err :connection error: desc = "transport: Error while dialing dial tcp     127.0.0.1:2379: connect: connection refused". Reconnecting...
I0906 10:54:22.753069       1 client.go:354] parsed scheme: ""
I0906 10:54:22.753118       1 client.go:354] scheme "" not registered,     fallback to default scheme
I0906 10:54:22.753204       1 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{127.0.0.1:2379 0  <nil>}]
I0906 10:54:22.753354       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{127.0.0.1:2379 <nil>}]
W0906 10:54:22.753855       1 clientconn.go:1251] grpc:     addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}.     Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:22.757983       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:23.754019       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:24.430000       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:25.279869       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:26.931974       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:28.198719       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:30.825660       1 clientconn.go:1251] grpc:     addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}.     Err :connection error: desc = "transport: Error while dialing dial tcp     127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:32.850511       1 clientconn.go:1251] grpc:     addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}.     Err :connection error: desc = "transport: Error while dialing dial tcp     127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:36.294749       1 clientconn.go:1251] grpc:     addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}.     Err :connection error: desc = "transport: Error while dialing dial tcp     127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:38.737408       1 clientconn.go:1251] grpc:     addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}.     Err :connection error: desc = "transport: Error while dialing dial tcp     127.0.0.1:2379: connect: connection refused". Reconnecting...
F0906 10:54:41.757603       1 storage_decorator.go:57] Unable to     create storage backend: config (&{ /registry {[https://127.0.0.1:2379]     /etc/kubernetes/pki/apiserver-etcd-client.key     /etc/kubernetes/pki/apiserver-etcd-client.crt     /etc/kubernetes/pki/etcd/ca.crt} true 0xc00063dd40     apiextensions.k8s.io/v1beta1 <nil> 5m0s 1m0s}), err (dial tcp     127.0.0.1:2379: connect: connection refused)

【问题讨论】:

  • 您是如何安装 kubernetes 的,是使用标准发行版还是其他标准安装方法(例如 kubeadm 或类似方法)?
  • 我使用 kubeadm 安装的
  • 好的,这有助于了解您的安装可能是什么样子。关于其他主组件,它们很可能通过 kubelet 运行,因此它们不会有任何 systemd 单元,只有 kubelet 本身。您可以检查 kubelet systemd 单元的运行标志,还可以在 /etc/kubernetes/manifests 中查看它将运行的组件清单。我不确定为什么您看不到 docker ps 中列出的其他组件,但很可能它们没有运行,因为您的 kubelet 和 docker 日志都显示了一些错误。
  • 您能否粘贴journalctl --all --no-pager -u kubelet.service 的输出以及您之前为获取 docker 日志而运行的任何内容,该命令是什么?
  • 我真的很抱歉。我不知道有答案。放了一天,kubernetes环境又回来了。我不知道为什么。感谢您的友好回复:)

标签: kube-apiserver


【解决方案1】:

答案在@cewood 的评论中;

好的,这有助于了解您安装的可能外观 喜欢。关于其他主组件,这些可能正在运行 通过 kubelet,因此它们不会有任何 systemd 单元, 仅适用于 kubelet 本身。

使用 kubeadm install 您看不到服务;

作为根

systemctl start docker
systemctl start kubectl

切换到非root用户 su 非root用户-

kubectl get pods

【讨论】:

    【解决方案2】:

    好久不见。

    我完全明白如何解决这个问题了!

    如果您无缘无故地收到这样的错误,您可以通过以下方式修复它:

    docker rm $(docker ps -a -q)
    

    可能是在重启现有的 Kubernetes 容器时发生了错误,而新运行的容器崩溃了。

    watch docker ps
    

    如果用watch查看容器,1分钟内可以看到kube-apiserver等都关闭了。

    所以我决定删除所有出现在 docker ps -a 中的容器并且它已经修复了!

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-07-03
      • 2019-11-24
      • 2020-10-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-03-12
      • 1970-01-01
      相关资源
      最近更新 更多