【发布时间】:2021-11-12 13:45:45
【问题描述】:
我有一个在 AWS EKS 上运行的多容器 pod。一个运行在 80 端口的 Web 应用容器和一个运行在 6379 端口的 Redis 容器。
部署完成后,在集群内对 pod 的 IP 地址:端口进行手动 curl 探测始终是良好的响应。
服务的入口也很好。
但是,kubelet 的探测失败,导致重新启动,我不确定如何复制该探测失败或修复它。
感谢阅读!
以下是活动:
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Liveness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Liveness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Liveness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Liveness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Normal Killing pod/app-7cddfb865b-gsvbg Container app failed liveness probe, will be restarted
0s Normal Pulling pod/app-7cddfb865b-gsvbg Pulling image "registry/app:latest"
0s Normal Pulled pod/app-7cddfb865b-gsvbg Successfully pulled image "registry/app:latest"
0s Normal Created pod/app-7cddfb865b-gsvbg Created container app
让事情变得通用,这是我的部署 yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "16"
creationTimestamp: "2021-05-26T22:01:19Z"
generation: 19
labels:
app: app
chart: app-1.0.0
environment: production
heritage: Helm
owner: acme
release: app
name: app
namespace: default
resourceVersion: "234691173"
selfLink: /apis/apps/v1/namespaces/default/deployments/app
uid: 3149acc2-031e-4719-89e6-abafb0bcdc3c
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: app
release: app
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 100%
type: RollingUpdate
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2021-09-17T09:04:49-07:00"
creationTimestamp: null
labels:
app: app
environment: production
owner: acme
release: app
spec:
containers:
- image: redis:5.0.6-alpine
imagePullPolicy: IfNotPresent
name: redis
ports:
- containerPort: 6379
hostPort: 6379
name: redis
protocol: TCP
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- env:
- name: SYSTEM_ENVIRONMENT
value: production
envFrom:
- configMapRef:
name: app-production
- secretRef:
name: app-production
image: registry/app:latest
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 90
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 1
name: app
ports:
- containerPort: 80
hostPort: 80
name: app
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 90
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 500Mi
requests:
cpu: "1"
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
priorityClassName: critical-app
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-08-10T17:34:18Z"
lastUpdateTime: "2021-08-10T17:34:18Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-05-26T22:01:19Z"
lastUpdateTime: "2021-09-17T16:48:54Z"
message: ReplicaSet "app-7f7cb8fd4" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 19
readyReplicas: 1
replicas: 1
updatedReplicas: 1
这是我的服务 yaml:
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2021-05-05T20:11:33Z"
labels:
app: app
chart: app-1.0.0
environment: production
heritage: Helm
owner: acme
release: app
name: app
namespace: default
resourceVersion: "163989104"
selfLink: /api/v1/namespaces/default/services/app
uid: 1f54cd2f-b978-485e-a1af-984ffeeb7db0
spec:
clusterIP: 172.20.184.161
externalTrafficPolicy: Cluster
ports:
- name: http
nodePort: 32648
port: 80
protocol: TCP
targetPort: 80
selector:
app: app
release: app
sessionAffinity: None
type: NodePort
status:
loadBalancer: {}
2021 年 10 月 20 日更新:
所以我接受了建议,用这些慷慨的设置来修补就绪探测:
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 300
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
这些是事件:
5m21s Normal Scheduled pod/app-686494b58b-6cjsq Successfully assigned default/app-686494b58b-6cjsq to ip-10-10-14-127.compute.internal
5m20s Normal Created pod/app-686494b58b-6cjsq Created container redis
5m20s Normal Started pod/app-686494b58b-6cjsq Started container redis
5m20s Normal Pulling pod/app-686494b58b-6cjsq Pulling image "registry/app:latest"
5m20s Normal Pulled pod/app-686494b58b-6cjsq Successfully pulled image "registry/app:latest"
5m20s Normal Created pod/app-686494b58b-6cjsq Created container app
5m20s Normal Pulled pod/app-686494b58b-6cjsq Container image "redis:5.0.6-alpine" already present on machine
5m19s Normal Started pod/app-686494b58b-6cjsq Started container app
0s Warning Unhealthy pod/app-686494b58b-6cjsq Readiness probe failed: Get http://10.10.14.117:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
虽然当我实际手动请求运行状况检查页面(根页面)时,我看到就绪探测开始起作用,这很奇怪。但即便如此,探测失败并不是因为容器运行不正常——它们是——而是其他地方。
【问题讨论】:
标签: kubernetes amazon-eks