【问题标题】:CrashLoopBackOff Error when deploying Django app on GKE (Kubernetes)在 GKE (Kubernetes) 上部署 Django 应用程序时出现 CrashLoopBackOff 错误
【发布时间】:2020-03-31 03:09:22
【问题描述】:

伙计们,

现在仍然存在什么问题: 我现在已经超越了卡在 CrashLoopBackOff 上的代码,按照 Emil Gi 的建议修复了 Dockerfile 运行命令,但是外部 IP 没有转发到我的 pod 库应用服务器

状态

  • 将 Dockerfile 中的端口固定为 8080,并确保其在整个过程中保持一致
  • 确保 Dockerfile 具有正确的命令,使其不会在启动后立即终止,这是导致 CrashLoop Back 的原因
  • 问题仍然是我点击的负载均衡器外部 IP 报此错误“无法访问此站点34.93.141.11 拒绝连接。”

原问题:

如何解决此 CrashLoopBackOff?我查看了许多文档并尝试调试,但不确定是什么原因造成的?该应用程序在本地模式下完美运行,它甚至可以顺利部署到 appengine 标准中,但 GKE 不行。任何调试这个进一步的指针最受赞赏。 问题: cloudsql 代理容器正在运行,但 library-app 容器出现 CrashLoopBackOff 错误。 pod 被分配到一个节点,开始拉取镜像,启动镜像,然后进入这个 BackOff 状态。

 $ kubectl get pods
NAME                       READY   STATUS             RESTARTS   AGE
library-7699b84747-9skst   1/2     CrashLoopBackOff   28         121m

$ kubectl logs library-7699b84747-9skst 
Error from server (BadRequest): a container name must be specified for pod library-7699b84747-9skst, choose one of: [library-app cloudsql-proxy]

​$ kubectl describe pods library-7699b84747-9skst
Name:               library-7699b84747-9skst
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               gke-library-default-pool-35b5943a-ps5v/10.160.0.13
Start Time:         Fri, 06 Dec 2019 09:34:11 +0530
Labels:             app=library
                    pod-template-hash=7699b84747
Annotations:        kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container library-app; cpu request for container cloudsql-proxy
Status:             Running
IP:                 10.16.0.10
Controlled By:      ReplicaSet/library-7699b84747
Containers:
  library-app:
    Container ID:   docker://e7d8aac3dff318de34f750c3f1856cd754aa96a7203772de748b3e397441a609
    Image:          gcr.io/library-259506/library
    Image ID:       docker-pullable://gcr.io/library-259506/library@sha256:07f54e055621ab6ddcbb49666984501cf98c95133bcf7405ca076322fb0e4108
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 06 Dec 2019 09:35:07 +0530
      Finished:     Fri, 06 Dec 2019 09:35:07 +0530
    Ready:          False
    Restart Count:  2
    Requests:
      cpu:  100m
    Environment:
      DATABASE_USER:      <set to the key 'username' in secret 'cloudsql'>  Optional: false
      DATABASE_PASSWORD:  <set to the key 'password' in secret 'cloudsql'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kj497 (ro)
  cloudsql-proxy:
    Container ID:  docker://352284231e7f02011dd1ab6999bf9a283b334590435278442e9a04d4d0684405
    Image:         gcr.io/cloudsql-docker/gce-proxy:1.16
    Image ID:      docker-pullable://gcr.io/cloudsql-docker/gce-proxy@sha256:7d302c849bebee8a3fc90a2705c02409c44c91c813991d6e8072f092769645cf
    Port:          <none>
    Host Port:     <none>
    Command:
      /cloud_sql_proxy
      --dir=/cloudsql
      -instances=library-259506:asia-south1:library=tcp:3306
      -credential_file=/secrets/cloudsql/credentials.json
    State:          Running
      Started:      Fri, 06 Dec 2019 09:34:51 +0530
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /cloudsql from cloudsql (rw)
      /etc/ssl/certs from ssl-certs (rw)
      /secrets/cloudsql from cloudsql-oauth-credentials (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kj497 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  cloudsql-oauth-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cloudsql-oauth-credentials
    Optional:    false
  ssl-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ssl/certs
    HostPathType:  
  cloudsql:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  default-token-kj497:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-kj497
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age               From                                             Message
  ----     ------     ----              ----                                             -------
  Normal   Scheduled  86s               default-scheduler                                Successfully assigned default/library-7699b84747-9skst to gke-library-default-pool-35b5943a-ps5v
  Normal   Pulling    50s               kubelet, gke-library-default-pool-35b5943a-ps5v  pulling image "gcr.io/cloudsql-docker/gce-proxy:1.16"
  Normal   Pulled     47s               kubelet, gke-library-default-pool-35b5943a-ps5v  Successfully pulled image "gcr.io/cloudsql-docker/gce-proxy:1.16"
  Normal   Created    46s               kubelet, gke-library-default-pool-35b5943a-ps5v  Created container
  Normal   Started    46s               kubelet, gke-library-default-pool-35b5943a-ps5v  Started container
  Normal   Pulling    2s (x4 over 85s)  kubelet, gke-library-default-pool-35b5943a-ps5v  pulling image "gcr.io/library-259506/library"
  Normal   Created    1s (x4 over 50s)  kubelet, gke-library-default-pool-35b5943a-ps5v  Created container
  Normal   Started    1s (x4 over 50s)  kubelet, gke-library-default-pool-35b5943a-ps5v  Started container
  Normal   Pulled     1s (x4 over 52s)  kubelet, gke-library-default-pool-35b5943a-ps5v  Successfully pulled image "gcr.io/library-259506/library"
  Warning  BackOff    1s (x5 over 43s)  kubelet, gke-library-default-pool-35b5943a-ps5v  Back-off restarting failed container​

这是我必须使用的 library.yaml 文件。

# [START kubernetes_deployment]
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: library
  labels:
    app: library
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: library
    spec:
      containers:
      - name: library-app
        # Replace  with your project ID or use `make template`
        image: gcr.io/library-259506/library
        # This setting makes nodes pull the docker image every time before
        # starting the pod. This is useful when debugging, but should be turned
        # off in production.
        imagePullPolicy: Always
        env:
            # [START cloudsql_secrets]
            - name: DATABASE_USER
              valueFrom:
                secretKeyRef:
                  name: cloudsql
                  key: username
            - name: DATABASE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: cloudsql
                  key: password
            # [END cloudsql_secrets]
        ports:
        - containerPort: 8080

      # [START proxy_container]
      - image: gcr.io/cloudsql-docker/gce-proxy:1.16
        name: cloudsql-proxy
        command: ["/cloud_sql_proxy", "--dir=/cloudsql", 
                  "-instances=library-259506:asia-south1:library=tcp:3306",
                  "-credential_file=/secrets/cloudsql/credentials.json"]
        volumeMounts:
          - name: cloudsql-oauth-credentials
            mountPath: /secrets/cloudsql
            readOnly: true
          - name: ssl-certs
            mountPath: /etc/ssl/certs
          - name: cloudsql
            mountPath: /cloudsql
      # [END proxy_container] 
      # [START volumes]
      volumes:
        - name: cloudsql-oauth-credentials
          secret:
            secretName: cloudsql-oauth-credentials
        - name: ssl-certs
          hostPath:
            path: /etc/ssl/certs
        - name: cloudsql
          emptyDir:
      # [END volumes]        
# [END kubernetes_deployment]

---
    # [START service]
    # The library-svc service provides a load-balancing proxy over the polls app
    # pods. By specifying the type as a 'LoadBalancer', Container Engine will
    # create an external HTTP load balancer.
    # The service directs traffic to the deployment by matching the service's selector to the deployment's label
    #
    # For more information about external HTTP load balancing see:
    # https://cloud.google.com/container-engine/docs/load-balancer
    apiVersion: v1
    kind: Service
    metadata:
      name: library-svc
    spec:
      type: LoadBalancer
      ports:
      - port: 80
        targetPort: 8080
      selector:
        app: library

    # [END service]

更多错误状态

Container 'library-app' keeps crashing.
CrashLoopBackOff
Reason  
Container 'library-app' keeps crashing.
Check Pod's logs to see more details. Learn more
Source  
library-7699b84747-9skst

Conditions  
Initialized: True Ready: False ContainersReady: False PodScheduled: True

 - lastProbeTime: null
    lastTransitionTime: "2019-12-06T06:03:43Z"
    message: 'containers with unready status: [library-app]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady

关键事件

Back-off 重启失败的容器 BackOff 2019 年 12 月 6 日 9:34:54 2019年12月6日上午12:24:26 PM 779拉图

"gcr.io/library-259506/library" 拉取 2019 年 12 月 6 日,12 月 6 日上午 9:34:12, 2019 年上午 11:59:26 34

Dockerfile 如下(顺便说一句,这修复了 CrashLoop):

FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
COPY requirements.txt /code/
RUN pip install -r requirements.txt
COPY . /code/

# Server
EXPOSE 8080
STOPSIGNAL SIGINT
ENTRYPOINT ["python", "manage.py"]
CMD ["runserver", "0.0.0.0:8080"]

【问题讨论】:

  • 顺便说一句:我在发布这个问题stackoverflow.com/questions/41604499/… 之前确实阅读了这篇文章,甚至尝试在 pid 问题的情况下添加一个命令,但没有帮助。
  • 尝试从这两个容器中提取日志,也许您可​​以对此有所了解
  • 退出代码为 0。根据 GCP 文档cloud.google.com/kubernetes-engine/docs/… 如果退出代码为 0,请验证您的应用运行了多长时间。当您的应用程序的主进程退出时,容器会退出。如果您的应用很快完成执行,容器可能会继续重启。
  • $ kubectl logs library-7699b84747-9skst 服务器错误(BadRequest):必须为 pod library-7699b84747-9skst 指定容器名称,请选择以下之一:[library-app cloudsql-proxy] 容器cloudsql-proxy 正在运行
  • 退出代码 0 表示您的代码执行完毕。您可以在 docker 映像中发布您正在使用的命令吗?

标签: django kubernetes google-kubernetes-engine


【解决方案1】:

我认为很多东西都在一起

  • 我发现 db 的密码有一个特殊字符,需要用引号括起来,然后确保端口号在 Dockerfile、library.yaml 文件中准确无误。这确保了秘密确实有效,我在日志中检测到密码不匹配问题。
  • 重要提示:命令行修复了 Emil G 以确保我的 Dockerfile 不会快速退出,因此请确保 CMD 实际工作并运行您的服务器。
  • 重要提示:最后我找到了一个外部 IP 未连接到我的服务器的修复程序,请参阅这个线程,我解释了哪里出了问题:基本上我需要一个安全上下文,我必须修复 runAs 以不以 root 身份运行:@ 987654321@
  • 我还记录了部署步骤 1-15 和
  • 的所有步骤

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-01-19
    • 2011-10-28
    • 1970-01-01
    • 2021-12-06
    • 2021-11-27
    • 2021-03-25
    • 2016-04-22
    • 2020-05-04
    相关资源
    最近更新 更多