【问题标题】:calico-node pods don't start after gke cluster upgrade from 1.10.x to 1.11.xgke 集群从 1.10.x 升级到 1.11.x 后,calico-node pod 不启动
【发布时间】:2018-12-04 22:26:47
【问题描述】:

我们已将 GKE 集群升级到 1.11.x,尽管该过程成功完成,但集群无法正常工作。有多个 pod 崩溃或保持 peding 并且 calico 网络上的所有点都无法正常工作:

calico-node-2hhfz       1/2       CrashLoopBackOff    5          6m

它的日志显示了这个信息:

kubectl -n kube-system logs -f calico-node-2hhfz calico-node

注意最后的错误(could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)):

2018-12-04 11:22:39.617 [INFO][10] startup.go 252: Early log level set to info
2018-12-04 11:22:39.618 [INFO][10] startup.go 268: Using NODENAME environment for node name
2018-12-04 11:22:39.618 [INFO][10] startup.go 280: Determined node name: gke-apps-internas-apps-internas-4c-6r-ecf8b140-9p8x
2018-12-04 11:22:39.619 [INFO][10] startup.go 303: Checking datastore connection
2018-12-04 11:22:39.626 [INFO][10] startup.go 327: Datastore connection verified
2018-12-04 11:22:39.626 [INFO][10] startup.go 100: Datastore is ready
2018-12-04 11:22:39.632 [INFO][10] startup.go 1052: Running migration
2018-12-04 11:22:39.632 [INFO][10] migrate.go 866: Querying current v1 snapshot and converting to v3
2018-12-04 11:22:39.632 [INFO][10] migrate.go 875: handling FelixConfiguration (global) resource
2018-12-04 11:22:39.637 [INFO][10] migrate.go 875: handling ClusterInformation (global) resource
2018-12-04 11:22:39.637 [INFO][10] migrate.go 875: skipping FelixConfiguration (per-node) resources - not supported
2018-12-04 11:22:39.637 [INFO][10] migrate.go 875: handling BGPConfiguration (global) resource
2018-12-04 11:22:39.637 [INFO][10] migrate.go 600: Converting BGP config -> BGPConfiguration(default)
2018-12-04 11:22:39.644 [INFO][10] migrate.go 875: skipping Node resources - these do not need migrating
2018-12-04 11:22:39.644 [INFO][10] migrate.go 875: skipping BGPPeer (global) resources - these do not need migrating
2018-12-04 11:22:39.644 [INFO][10] migrate.go 875: handling BGPPeer (node) resources
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping HostEndpoint resources - not supported
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping IPPool resources - these do not need migrating
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping GlobalNetworkPolicy resources - these do not need migrating
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping Profile resources - these do not need migrating
2018-12-04 11:22:39.652 [INFO][10] migrate.go 875: skipping WorkloadEndpoint resources - these do not need migrating
2018-12-04 11:22:39.652 [INFO][10] migrate.go 875: data converted successfully
2018-12-04 11:22:39.652 [INFO][10] migrate.go 866: Storing v3 data
2018-12-04 11:22:39.652 [INFO][10] migrate.go 875: Storing resources in v3 format
2018-12-04 11:22:39.673 [INFO][10] migrate.go 1151: Failed to create resource Key=BGPConfiguration(default) error=resource does not exist: BGPConfiguration(default) with error: the server could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
2018-12-04 11:22:39.673 [ERROR][10] migrate.go 884: Unable to store the v3 resources
2018-12-04 11:22:39.673 [INFO][10] migrate.go 875: cause: resource does not exist: BGPConfiguration(default) with error: the server could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
2018-12-04 11:22:39.673 [ERROR][10] startup.go 107: Unable to ensure datastore is migrated. error=Migration failed: error storing converted data: resource does not exist: BGPConfiguration(default) with error: the server could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
2018-12-04 11:22:39.673 [WARNING][10] startup.go 1066: Terminating
Calico node failed to start

知道如何修复集群吗?

【问题讨论】:

    标签: kubernetes google-kubernetes-engine project-calico


    【解决方案1】:

    由于缺少 BGPConfiguration 的自定义资源定义,GKE 升级过程出现问题,导致 calico pod 无法启动。

    将对应的crd应用到集群后问题解决了:

    apiVersion: apiextensions.k8s.io/v1beta1
    kind: CustomResourceDefinition
    metadata:
      name: bgpconfigurations.crd.projectcalico.org
    spec:
      scope: Cluster
      group: crd.projectcalico.org
      version: v1
      names:
        kind: BGPConfiguration
        plural: bgpconfigurations
        singular: bgpconfiguration 
    

    【讨论】:

    • Google 应该会解决这个问题,因为今天会有很多人升级到昨天宣布的安全漏洞。
    • 其中一些问题已通过公共问题跟踪器报告并正在处理中
    猜你喜欢
    • 2019-03-05
    • 2021-05-25
    • 2020-08-25
    • 2022-06-10
    • 2017-02-11
    • 1970-01-01
    • 1970-01-01
    • 2011-07-10
    • 2018-09-16
    相关资源
    最近更新 更多