【问题标题】:Nginx-ingress-controller fails to start after AKS upgrade to v1.22AKS 升级到 v1.22 后 Nginx-ingress-controller 无法启动
【发布时间】:2022-02-07 20:45:01
【问题描述】:

我们将 kubernetes 集群从 v1.21 升级到 v1.22。在此操作之后,我们发现我们的 nginx-ingress-controller 部署的 pod 无法启动,并出现以下错误消息: pkg/mod/k8s.io/client-go@v0.18.5/tools/cache/reflector.go:125: Failed to list *v1beta1.Ingress: the server could not find the requested resource

我们发现此问题已在此处跟踪:https://github.com/bitnami/charts/issues/7264

因为 azure 不允许将集群降级回 1.21,请您帮助我们修复 nginx-ingress-controller 部署?您能否具体说明应该做什么以及从哪里(本地机器或 azure cli 等),因为我们对 helm 不是很熟悉。

这是我们当前部署的 yaml:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: nginx-ingress-controller
  namespace: ingress
  uid: 575c7699-1fd5-413e-a81d-b183f8822324
  resourceVersion: '166482672'
  generation: 16
  creationTimestamp: '2020-10-10T10:20:07Z'
  labels:
    app: nginx-ingress
    app.kubernetes.io/component: controller
    app.kubernetes.io/managed-by: Helm
    chart: nginx-ingress-1.41.1
    heritage: Helm
    release: nginx-ingress
  annotations:
    deployment.kubernetes.io/revision: '2'
    meta.helm.sh/release-name: nginx-ingress
    meta.helm.sh/release-namespace: ingress
  managedFields:
    - manager: kube-controller-manager
      operation: Update
      apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:replicas: {}
      subresource: scale
    - manager: Go-http-client
      operation: Update
      apiVersion: apps/v1
      time: '2020-10-10T10:20:07Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:meta.helm.sh/release-name: {}
            f:meta.helm.sh/release-namespace: {}
          f:labels:
            .: {}
            f:app: {}
            f:app.kubernetes.io/component: {}
            f:app.kubernetes.io/managed-by: {}
            f:chart: {}
            f:heritage: {}
            f:release: {}
        f:spec:
          f:progressDeadlineSeconds: {}
          f:revisionHistoryLimit: {}
          f:selector: {}
          f:strategy:
            f:rollingUpdate:
              .: {}
              f:maxSurge: {}
              f:maxUnavailable: {}
            f:type: {}
          f:template:
            f:metadata:
              f:labels:
                .: {}
                f:app: {}
                f:app.kubernetes.io/component: {}
                f:component: {}
                f:release: {}
            f:spec:
              f:containers:
                k:{"name":"nginx-ingress-controller"}:
                  .: {}
                  f:args: {}
                  f:env:
                    .: {}
                    k:{"name":"POD_NAME"}:
                      .: {}
                      f:name: {}
                      f:valueFrom:
                        .: {}
                        f:fieldRef: {}
                    k:{"name":"POD_NAMESPACE"}:
                      .: {}
                      f:name: {}
                      f:valueFrom:
                        .: {}
                        f:fieldRef: {}
                  f:image: {}
                  f:imagePullPolicy: {}
                  f:livenessProbe:
                    .: {}
                    f:failureThreshold: {}
                    f:httpGet:
                      .: {}
                      f:path: {}
                      f:port: {}
                      f:scheme: {}
                    f:initialDelaySeconds: {}
                    f:periodSeconds: {}
                    f:successThreshold: {}
                    f:timeoutSeconds: {}
                  f:name: {}
                  f:ports:
                    .: {}
                    k:{"containerPort":80,"protocol":"TCP"}:
                      .: {}
                      f:containerPort: {}
                      f:name: {}
                      f:protocol: {}
                    k:{"containerPort":443,"protocol":"TCP"}:
                      .: {}
                      f:containerPort: {}
                      f:name: {}
                      f:protocol: {}
                  f:readinessProbe:
                    .: {}
                    f:failureThreshold: {}
                    f:httpGet:
                      .: {}
                      f:path: {}
                      f:port: {}
                      f:scheme: {}
                    f:initialDelaySeconds: {}
                    f:periodSeconds: {}
                    f:successThreshold: {}
                    f:timeoutSeconds: {}
                  f:resources:
                    .: {}
                    f:limits: {}
                    f:requests: {}
                  f:securityContext:
                    .: {}
                    f:allowPrivilegeEscalation: {}
                    f:capabilities:
                      .: {}
                      f:add: {}
                      f:drop: {}
                    f:runAsUser: {}
                  f:terminationMessagePath: {}
                  f:terminationMessagePolicy: {}
              f:dnsPolicy: {}
              f:restartPolicy: {}
              f:schedulerName: {}
              f:securityContext: {}
              f:serviceAccount: {}
              f:serviceAccountName: {}
              f:terminationGracePeriodSeconds: {}
    - manager: kube-controller-manager
      operation: Update
      apiVersion: apps/v1
      time: '2022-01-24T01:23:22Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:conditions:
            .: {}
            k:{"type":"Available"}:
              .: {}
              f:type: {}
            k:{"type":"Progressing"}:
              .: {}
              f:type: {}
    - manager: Mozilla
      operation: Update
      apiVersion: apps/v1
      time: '2022-01-28T23:18:41Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:template:
            f:spec:
              f:containers:
                k:{"name":"nginx-ingress-controller"}:
                  f:resources:
                    f:limits:
                      f:cpu: {}
                      f:memory: {}
                    f:requests:
                      f:cpu: {}
                      f:memory: {}
    - manager: kube-controller-manager
      operation: Update
      apiVersion: apps/v1
      time: '2022-01-28T23:29:49Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:deployment.kubernetes.io/revision: {}
        f:status:
          f:conditions:
            k:{"type":"Available"}:
              f:lastTransitionTime: {}
              f:lastUpdateTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
            k:{"type":"Progressing"}:
              f:lastTransitionTime: {}
              f:lastUpdateTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
          f:observedGeneration: {}
          f:replicas: {}
          f:unavailableReplicas: {}
          f:updatedReplicas: {}
      subresource: status
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx-ingress
      app.kubernetes.io/component: controller
      release: nginx-ingress
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx-ingress
        app.kubernetes.io/component: controller
        component: controller
        release: nginx-ingress
    spec:
      containers:
        - name: nginx-ingress-controller
          image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1
          args:
            - /nginx-ingress-controller
            - '--default-backend-service=ingress/nginx-ingress-default-backend'
            - '--election-id=ingress-controller-leader'
            - '--ingress-class=nginx'
            - '--configmap=ingress/nginx-ingress-controller'
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
            - name: https
              containerPort: 443
              protocol: TCP
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
          resources:
            limits:
              cpu: 300m
              memory: 512Mi
            requests:
              cpu: 200m
              memory: 256Mi
          livenessProbe:
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
          securityContext:
            capabilities:
              add:
                - NET_BIND_SERVICE
              drop:
                - ALL
            runAsUser: 101
            allowPrivilegeEscalation: true
      restartPolicy: Always
      terminationGracePeriodSeconds: 60
      dnsPolicy: ClusterFirst
      serviceAccountName: nginx-ingress
      serviceAccount: nginx-ingress
      securityContext: {}
      schedulerName: default-scheduler
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600
status:
  observedGeneration: 16
  replicas: 3
  updatedReplicas: 2
  unavailableReplicas: 3
  conditions:
    - type: Available
      status: 'False'
      lastUpdateTime: '2022-01-28T22:58:07Z'
      lastTransitionTime: '2022-01-28T22:58:07Z'
      reason: MinimumReplicasUnavailable
      message: Deployment does not have minimum availability.
    - type: Progressing
      status: 'False'
      lastUpdateTime: '2022-01-28T23:29:49Z'
      lastTransitionTime: '2022-01-28T23:29:49Z'
      reason: ProgressDeadlineExceeded
      message: >-
        ReplicaSet "nginx-ingress-controller-59d9f94677" has timed out
        progressing.

【问题讨论】:

  • 能否提供 azure-cli 的“helm list”命令的输出?
  • 了解更多详情“helm repo list”和“helm list --all-namespaces”

标签: nginx kubernetes kubernetes-helm kubernetes-ingress azure-aks


【解决方案1】:

@Philip Welz 的回答当然是正确的。由于在 Kubernetes v1.22 中删除了 v1beta1 Ingress API 版本,因此需要升级入口控制器。但这不是我们面临的唯一问题,所以我决定制作一个“非常非常简短”的指南,说明我们如何最终获得一个健康运行的集群(5 天后),这样它可能会为其他人省去挣扎。

1。升级 YAML 文件中的 nginx-ingress-controller 版本。

这里我们只是将yaml文件中的版本从:

image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1

image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v1.1.1

在此操作之后,生成了 v1.1.1 中的新 pod。它开始很好并且运行健康。不幸的是,这并没有让我们的微服务重新上线。现在我知道这可能是因为必须对现有的 ingresses yaml 文件进行一些更改,以使它们与新版本的 ingress 控制器兼容。所以直接进入第2步。现在(下面两个标题)。

暂时不要执行此步骤,仅在第 2 步失败时执行:重新安装 nginx-ingress-controller

我们决定在这种情况下,我们将按照 Microsoft 的官方文档:https://docs.microsoft.com/en-us/azure/aks/ingress-basic?tabs=azure-cli 从头开始​​重新安装控制器。请注意,这可能会更改入口控制器的外部 IP 地址。在我们的例子中,最简单的方法是删除整个 ingress 命名空间:

kubectl delete namespace ingress

不幸的是,这并没有删除入口类,所以需要额外的:

kubectl delete ingressclass nginx --all-namespaces

然后安装新的控制器:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx --create-namespace --namespace ingress 

如果您重新安装了 nginx-ingress-controller 或在步骤 1 中升级后更改了 IP 地址:更新您的网络安全组、负载均衡器和域 DNS

在您的 AKS 资源组中应该是 Network security group 类型的资源。它包含入站和出站安全规则(我知道它可以用作防火墙)。应该有一个默认的由 Kubernetes 自动管理的网络安全组,并且 IP 地址应该在那里自动刷新。

不幸的是,我们还有一个额外的自定义。我们不得不在那里手动更新规则。

在同一个资源组中应该有Load balancer 类型的资源。在Frontend IP configuration 选项卡中,仔细检查 IP 地址是否反映了您的新 IP 地址。作为奖励,您可以在 Backend pools 选项卡中仔细检查其中的地址是否与您的内部节点 IP 匹配。

最后不要忘记调整您的域 DNS 记录。

2。升级您的入口 yaml 配置文件以匹配语法更改

我们花了一些时间来确定一个工作模板,但实际上从上面提到的微软教程中安装 helloworld 应用程序对我们帮助很大。我们从这里开始:

kind: Ingress
apiVersion: networking.k8s.io/v1
metadata:
  name: hello-world-ingress
  namespace: services
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/rewrite-target: /$1
    nginx.ingress.kubernetes.io/ssl-redirect: 'false'
    nginx.ingress.kubernetes.io/use-regex: 'true'
  rules:
    - http:
        paths:
          - path: /hello-world-one(/|$)(.*)
            pathType: Prefix
            backend:
              service:
                name: aks-helloworld-one
                port:
                  number: 80

在逐步引入更改之后,我们终于做到了以下几点。但我很确定问题在于我们缺少nginx.ingress.kubernetes.io/use-regex: 'true' 条目:

kind: Ingress
apiVersion: networking.k8s.io/v1
metadata:
  name: example-api
  namespace: services
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/configuration-snippet: |
      more_set_headers "X-Forwarded-By: example-api";
    nginx.ingress.kubernetes.io/rewrite-target: /example-api
    nginx.ingress.kubernetes.io/ssl-redirect: 'true'
    nginx.ingress.kubernetes.io/use-regex: 'true'
spec:
  tls:
    - hosts:
        - services.example.com
      secretName: tls-secret
  rules:
    - host: services.example.com
      http:
        paths:
          - path: /example-api
            pathType: ImplementationSpecific
            backend:
              service:
                name: example-api
                port:
                  number: 80

以防万一有人出于测试目的安装 helloworld 应用程序,然后 yamls 如下所示:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aks-helloworld-one  
spec:
  replicas: 1
  selector:
    matchLabels:
      app: aks-helloworld-one
  template:
    metadata:
      labels:
        app: aks-helloworld-one
    spec:
      containers:
      - name: aks-helloworld-one
        image: mcr.microsoft.com/azuredocs/aks-helloworld:v1
        ports:
        - containerPort: 80
        env:
        - name: TITLE
          value: "Welcome to Azure Kubernetes Service (AKS)"
---
apiVersion: v1
kind: Service
metadata:
  name: aks-helloworld-one  
spec:
  type: ClusterIP
  ports:
  - port: 80
  selector:
    app: aks-helloworld-one

3。处理其他崩溃的应用程序...

另一个在我们的集群中崩溃的应用程序是cert-manager。这是 1.0.1 版,所以,首先,我们将它升级到 1.1.1 版:

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm upgrade --namespace cert-manager --version 1.1 cert-manager jetstack/cert-manager

这创造了一个全新的健康豆荚。我们很高兴并决定继续使用 v1.1,因为我们有点害怕升级到更高版本时必须采取的额外措施(查看本页底部https://cert-manager.io/docs/installation/upgrading/)。

集群现在终于修复了。是吗?

4。 ...但一定要检查兼容性图表!

嗯.. 现在我们知道 cert-manager 仅从 1.5 版本开始与 Kubernetes v1.22 兼容。我们很不幸,就在那天晚上,我们的 SSL 证书从到期日起超过了 30 天的门槛,所以证书经理决定更新证书!操作失败,证书管理器崩溃。 Kubernetes 回退到“Kubernetes 假证书”。由于证书无效,浏览器扼杀了流量,网页再次关闭。 修复方法是升级到 1.5 并同时升级 CRD:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.5.4/cert-manager.crds.yaml
helm upgrade --namespace cert-manager --version 1.5 cert-manager jetstack/cert-manager

在此之后,新的 cert-manager 实例成功刷新了我们的证书。集群再次保存。

如果你需要强制续订,你可以看看这个问题:https://github.com/jetstack/cert-manager/issues/2641

@ajcann 建议在证书中添加renewBefore 属性:

kubectl get certs --no-headers=true | awk '{print $1}' | xargs -n 1 kubectl patch certificate --patch '
- op: replace
  path: /spec/renewBefore
  value: 1440h
' --type=json

然后等待证书更新,然后删除属性:

kubectl get certs --no-headers=true | awk '{print $1}' | xargs -n 1 kubectl patch certificate --patch '
- op: remove
  path: /spec/renewBefore
' --type=json

【讨论】:

    【解决方案2】:

    Kubernetes 1.22 仅支持 NGINX Ingress Controller 1.0.0 及更高版本 = https://github.com/kubernetes/ingress-nginx#support-versions-table

    您需要在Chart.yaml 中将您的nginx-ingress-controller Bitnami Helm Chart 升级到版本 9.0.0。然后运行helm upgrade nginx-ingress-controller bitnami/nginx-ingress-controller

    您还应该定期更新您的入口控制器,因为 v0.34.1 版本非常旧,因为入口通常是集群外部指定的唯一入口。

    【讨论】:

      猜你喜欢
      • 2022-06-14
      • 1970-01-01
      • 2016-10-16
      • 1970-01-01
      • 2019-12-03
      • 2016-11-05
      • 2018-02-19
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多