Demonstration of self-healing feature in Kubernetes

Kamlesh Prajapati
4 min readJan 6, 2022

--

Before we actually jump onto the self healing feature in Kubernetes lets understand little bit about Deployment object in Kubernetes.

  • Overview of Kubernetes deployment
  • Self-heal from a Pod failure

Overview of Kubernetes deployment

There are multiple Kubernetes objects Service object is one of them to provide network connectivity to apps running in Pods. It has another dedicated object called a Deployment to provide self-healing. In fact, Deployments also enable scaling and rolling updates.

As with Pods and Service objects, Deployments are defined in YAML manifest files.

Figure below shows a Deployment manifest. It’s marked up to show how a container is nested in a Pod, and a Pod is nested in a Deployment.

Figure : 1.0

This nesting, or wrapping, is important in understanding how everything works.

• The container provides the OS and other app dependencies.

• The Pod provides metadata and other constructs for the container to run on

Kubernetes.

  • The Deployment provides cloud-native features, including self-healing.

How Deployments work

There are two important elements to the working of a Deployment.

1. The Deployment object

2. The Deployment controller

The Deployment object is the YAML configuration that defines an application. It states things like, which container to run, what network port to listen on, and how many instances (Pods) to deploy.

The Deployment controller is a control plane process that is constantly monitoring the cluster making sure all Deployment objects are running as they are supposed to.

Consider a very quick example.

You define an application in a Kubernetes Deployment manifest. It defines 5 instances of a Pod called cloud-one. You use kubectl to send this to Kubernetes and Kubernetes schedules the 5 Pods on the cluster.

At this point, observed state matches desired state. That’s jargon for saying the cluster is running what you asked it to. But let’s say a Node fails and the number of cloud-one Pods drops to 4. Observed state no longer matches desired state and you have a situation.

But don’t stress. The Deployment controller is watching the cluster and will see the change. It knows you desire 5 Pods, but it can only observe 4. So it’ll start a 5th Pod to bring observed state back in line with desired state. This process is called reconciliation.

Self-heal from a Pod failure

In this section, we will deploy 5 replicas of a Pod via a Kubernetes Deployment object. After that, lets manually delete a Pod and see Kubernetes self-heal. We will create the deploy.yml manifest and store it somewhere in the local system or in repository like Github. As seen in the below snippet, it defines 5 Pod replicas running the app you are containerizing .

Figure: 1.1

Check for any Pods and Deployments already running on your cluster.

$ kubectl get pods

No resources found in default namespace.

$ kubectl get deployments

No resources found in default namespace.

Now use kubectl to deploy the Deployment to your cluster. The command must run from the folder containing the deploy.yml file.

$ kubectl apply -f deploy.yml

deployment.apps/qsk-deploy created

Check for status of the deployment and Pods it is managing

You can see above that 5 out of 5 replicas are running and in ready state. The Deployment controller is also running on the control plane watching the state of things.

Pod failure

It’s possible for Pods and the apps they are running to crash or fail. Kubernetes can attempt to self-heal a situation like this by starting a new Pod to replace the failed one.

Use kubectl delete pod to manually delete one of the Pods (see your previous kubectl get pods output for a list of Pod names).

$ kubectl delete pod qsk-deploy-69996c4549-r59nl

pod “qsk-deploy-69996c4549-r59nl” deleted

As soon as the Pod is deleted, the number of Pods on the cluster will drop to 4 and no longer match the desired state of 5. The Deployment controller will notice this and automatically start a new Pod to take the observed number of Pods back to 5.

List the pods again to verify if a new Pod has been started

$ kubectl get pods

NAME READY STATUS RESTARTS AGE

qsk-deploy-69996c4549-mwl7f 1/1 Running 0 24m

qsk-deploy-69996c4549–9xwv8 1/1 Running 0 24m

qsk-deploy-69996c4549-ksg8t 1/1 Running 0 24m

qsk-deploy-69996c4549-qmxp7 1/1 Running 0 24m

qsk-deploy-69996c4549-hd5pn 1/1 Running 0 10s

Congratulations. There are 5 Pods running and Kubernetes performed the self healing without needing help from you.

Also, notice how the last Pod in the list has only been running for 10seconds. This is the replacement Pod Kubernetes started to reconcile desired state.

Moral of the story here is that deployment controller job is to makesure that the desire number of pods will always up and running.

Thanks for reading….

--

--

Kamlesh Prajapati
Kamlesh Prajapati

Written by Kamlesh Prajapati

DevSecOps Practitioner (CKA certified , RHOCP Certified, Azure Certified on az-104,az-400,az-303.), AIOps , Machine Learning and Deep learning

No responses yet