配置Kubernetes容器的健康检查

此文讲述如何配置容器的LiveneSS、ReadineSS、StaRtup探针。

kubelet使用LiveneSS探测器来知道什么时候要重启容器。例如，LiveneSS探测器可以捕捉到死锁(应用程序在运行，但是无法继续执行后面的步骤)。这样的情况下重启容器有助于让应用程序在有问题的情况下更可用。

Kubernetes容器健康检查配置

kubelet使用ReadineSS探测器可以知道容器什么时候准备好了并可以开始接受请求流量，当一个Pod内的所有容器都准备好了，才能把这个Pod看作就绪了。这种信号的一个用途就是控制哪个Pod作为SeRvice的后端。在Pod还没有准备好的时候，会从SeRvice的负载均衡器中被剔除的。

kubelet使用StaRtup探测器可以知道应用程序容器什么时候启动了。如果配置了这类探测器，就可以控制容器在启动成功后再进行LiveneSS和ReadineSS检查，确保这些存活、就绪探测器不会影响应用程序的启动。这可以用于对慢启动容器进行存活性检测，避免它们在启动运行之前就被杀掉。

定义一个LiveneSS探针

许多长时间运行的应用程序最终会过渡到断开的状态，除非重新启动，否则无法恢复。KubeRnetes提供了LiveneSS探测器来发现并补救这种情况。

创建一个Pod，其中运行一个基于k8s.gcR.io/bUSybox镜像的容器。配置文件如下。文件名：exec-liveneSS.yaMl

APIversion: v1 kind: Pod Metadata: labels: test: liveneSS naMe: liveneSS-exec spec: contAIneRs: – naMe: liveneSS image: k8s.gcR.io/bUSybox aRgs: – /BIn/sh – -c – toUCh /tMp/healthy; sLeep 30; RM -Rf /tMp/healthy; sLeep 600 liveneSSProbe: exec: command: – cat – /tMp/healthy inITialDelaYseconds: 5 peRiodSeconds: 5

在配置文件中，可以看到Pod中只有一个容器。peRiodSeconds字段指定了kubelet应该每5秒执行一次存活检测。inITialDelaYseconds字段告诉kubelet在执行第一次探针前应该等待5秒。kubelet在容器中执行命令cat /tMp/healthy来进行检测。如果命令执行成功并且返回值为0，kubelet会认为这个容器是健康存活的。如果这个命令返回非0值，kubelet会杀死这个容器并重新启动它。执行命令如下：

/BIn/sh -c “toUCh /tMp/healthy; sLeep 30; RM -Rf /tMp/healthy; sLeep 600”

这个容器生命的前30秒，/tMp/healthy文件是存在的。执行命令cat /tMp/healthy会返回成功码。30秒后，执行命令cat /tMp/healthy就回返回失败码。

创建Pod：

# kubectl apply -f /Root/k8s-example/Probe/exec-liveneSS.yaMl

在 30 秒内，查看Pod的事件：

kubectl descRibe pod liveneSS-exec

输出结果显示还没有存活探测器失败：

Events: Type reason Age FRoM MeSSage NoRMal Scheduled deFAult-scheduleR SUCceSSfully aSSigned deFAult/liveneSS-exec to k8s-node04 NoRMal Pulled 22s kubelet, k8s-node04 ContAIneR image “k8s.gcR.io/bUSybox” alReady pResent on MacHine NoRMal CReated 22s kubelet, k8s-node04 CReated contAIneR liveneSS NoRMal StaRted 22s kubelet, k8s-node04 StaRted contAIneR liveneSS

30秒之后，再来看Pod的事件：

kubectl descRibe pod liveneSS-exec

在输出结果的最下面，有信息显示存活探测器失败了，这个容器被杀死并且被重建了。

Events: Type reason Age FRoM MeSSage NoRMal Scheduled deFAult-scheduleR SUCceSSfully aSSigned deFAult/liveneSS-exec to k8s-node04 NoRMal Pulled 47s kubelet, k8s-node04 ContAIneR image “k8s.gcR.io/bUSybox” alReady pResent on MacHine NoRMal CReated 47s kubelet, k8s-node04 CReated contAIneR liveneSS NoRMal StaRted 47s kubelet, k8s-node04 StaRted contAIneR liveneSS WaRning Unhealthy 5s (x3 OVeR 15s) kubelet, k8s-node04 LiveneSS Probe Failed: cat: can””t open ””/tMp/healthy””: No sUCh file oR diRecTory NoRMal Killing 5s kubelet, k8s-node04 ContAIneR liveneSS Failed liveneSS Probe, will be RestaRted

再等另外30秒，检查看这个容器被重启了：

kubectl get pod liveneSS-exec NAME READY STATUS RESTARTS AGE liveneSS-exec 1/1 Running 2 3M10s

再查看Pod资源详情：

kubectl descRibe pod liveneSS-exec

输出结果如下，容器重启成功。

Events: Type reason Age FRoM MeSSage NoRMal Scheduled deFAult-scheduleR SUCceSSfully aSSigned deFAult/liveneSS-exec to k8s-node04 WaRning Unhealthy 35s (x6 OVeR 2M) kubelet, k8s-node04 LiveneSS Probe Failed: cat: can””t open ””/tMp/healthy””: No sUCh file oR diRecTory NoRMal Killing 35s (x2 OVeR 110s) kubelet, k8s-node04 ContAIneR liveneSS Failed liveneSS Probe, will be RestaRted NoRMal Pulled 5s (x3 OVeR 2M32s) kubelet, k8s-node04 ContAIneR image “k8s.gcR.io/bUSybox” alReady pResent on MacHine NoRMal CReated 5s (x3 OVeR 2M32s) kubelet, k8s-node04 CReated contAIneR liveneSS NoRMal StaRted 5s (x3 OVeR 2M32s) kubelet, k8s-node04 StaRted contAIneR liveneSS

定义一个存活态HTTP请求接口

另外一种类型的LiveneSS探测方式是使用HTTP GET请求。下面是一个Pod的配置文件，其中运行一个基于k8s.gcR.io/liveneSS镜像的容器。

创建Pod：

APIversion: v1 kind: Pod Metadata: labels: test: liveneSS naMe: liveneSS-http spec: contAIneRs: – naMe: liveneSS image: k8s.gcR.io/liveneSS aRgs: – /seRveR liveneSSProbe: httpGet: path: /healthz poRt: 8080 httpHeadeRs: – naMe: X-CUStoM-HeadeR value: AwesoMe inITialDelaYseconds: 3 peRiodSeconds: 3

配置文件中，Pod中只有一个容器。peRiodSeconds字段指定了kubelet每隔3秒执行一次检测。inITialDelaYseconds字段告诉kubelet在执行第一次探测前应该等待3秒。kubelet 会向容器内运行的服务(服务会监听 8080 端口)发送一个 HTTP GET 请求来执行探测。如果服务上/healthz路径下的处理程序返回成功码。则kubelet认为容器是健康存活的。如果处理程序返回失败码，则kubelet会杀死这个容器并且重新启动它。

任何大于或等于200并且小于400的返回码标示成功，其它返回码都标示失败。

容器存活的最开始10秒中，/healthz处理程序返回一个200的状态码。之后处理程序返回500的状态码。

http.HandleFunc(“/healthz”, func(w http.ResponseWRITeR, R *http.request) { duRation := tiMe.Now().Sub(staRted) if duRation.Seconds() > 10 { w.WRITeHeadeR(500) w.WRITe([]byte(fMt.SpRintf(“Error: %v”, duRation.Seconds()))) } else { w.WRITeHeadeR(200) w.WRITe([]byte(“ok”)) } })

kubelet在容器启动之后3秒开始执行健康检测。所以前几次健康检查都是成功的。但是10秒之后，健康检查会失败，并且kubelet会杀死容器再重新启动容器。

# kubectl apply -f /Root/k8s-example/Probe/http-liveneSS.yaMl

10秒之后，通过看Pod事件来检测存活探测器已经失败了并且容器被重新启动了。

Events: Type reason Age FRoM MeSSage NoRMal Scheduled deFAult-scheduleR SUCceSSfully aSSigned deFAult/liveneSS-http to k8s-node01 NoRMal Pulled 17s kubelet, k8s-node01 ContAIneR image “k8s.gcR.io/liveneSS” alReady pResent on MacHine NoRMal CReated 17s kubelet, k8s-node01 CReated contAIneR liveneSS NoRMal StaRted 16s kubelet, k8s-node01 StaRted contAIneR liveneSS WaRning Unhealthy 1s (x2 OVeR 4s) kubelet, k8s-node01 LiveneSS Probe Failed: HTTP Probe Failed wITh statUScode: 500

定义TCP的存活探测

第三种类型的liveneSS探测是使用TCP套接字。通过配置，kubelet会尝试在指定端口和容器建立套接字链接。如果能建立链接，这个容器就被看作是健康的，如果不能则这个容器就被看作是有问题的。

创建一个Pod。文件名：TCP-liveneSS-ReadineSS.yaMl

APIversion: v1 kind: Pod Metadata: naMe: GoProxy labels: app: GoProxy spec: contAIneRs: – naMe: GoProxy iMage: k8s.gcR.io/GoProxy:0.1 poRts: – contAIneRPoRt: 8080 ReadineSSProbe: TCPSocket: poRt: 8080 inITialDelaYseconds: 5 peRiodSeconds: 10 liveneSSProbe: TCPSocket: poRt: 8080 inITialDelaYseconds: 15 peRiodSeconds: 20

TCP检测的配置和HTTP检测非常相似。下面这个例子同时使用就绪和存活探测器。kubelet会在容器启动5秒后发送第一个就绪探测。这会尝试连接GoProxy容器的8080端口。如果探测成功，这个Pod会被标记为就绪状态，kubelet将继续每隔10秒运行一次检测。

除了ReadineSS探测，这个配置包括了一个LiveneSS探测。kubelet会在容器启动15秒后进行第一次LiveneSS探测。就像ReadineSS探测一样，会尝试连接GoProxy容器的8080端口。如果存活探测失败，这个容器会被重新启动。

# kubectl apply -f /Root/k8s-exaMple/Probe/TCP-liveneSS-ReadineSS.yaMl

15秒之后，通过看Pod事件来检测存活探测器：

# kubectl descRibe pod GoProxy

使用命名端口：

对于HTTP或者TCP存活检测可以使用命名的容器端口。

poRts: – naMe: liveneSS-poRt contAIneRPoRt: 8080 hostPoRt: 8080 liveneSSProbe: httpGet: path: /healthz poRt: liveneSS-poRt

有时候，会有一些现有的应用程序在启动时需要较多的初始化时间。要不影响对引起探测死锁的快速响应，这种情况下，设置LiveneSS探测参数是要技巧的。技巧就是使用一个命令来设置StaRtup探测，针对HTTP或者TCP检测，可以通过设置failureThReshold * peRiodSeconds参数来保证有足够长的时间应对糟糕情况下的启动时间。

所以，前面的例子就变成了：

poRts: – naMe: liveneSS-poRt contAIneRPoRt: 8080 hostPoRt: 8080 liveneSSProbe: httpGet: path: /healthz poRt: liveneSS-poRt failureThReshold: 1 peRiodSeconds: 10 staRtupProbe: httpGet: path: /healthz poRt: liveneSS-poRt failureThReshold: 30 peRiodSeconds: 10

幸亏有StaRtup探测，应用程序将会有最多5分钟(30*10=300s)的时间来完成它的启动。一旦StaRtup探测成功一次，存活探测任务就会接管对容器的探测，对容器死锁可以快速响应。如果StaRtup探测一直没有成功，容器会在300秒后被杀死，并且根据RestaRtPolicy来设置Pod状态。

定义ReadlineSS探测器

有时候，应用程序会暂时性的不能提供通信服务。例如，应用程序在启动时可能需要加载很大的数据或配置文件，或是启动后要依赖等待外部服务。在这种情况下，既不想

chatGPT

近期文章

互联网技术 / 互联网资讯 · 2024年1月29日

配置Kubernetes容器的健康检查

Need more than content? Move into the product flow.