Kamlesh Prajapati
5 min readDec 27, 2023

Before we start talking about the hpa and vpa its very important to understand what is autoscaling in kubernetes.?

Autoscaling is one of the key features in k8s cluster. It is a feature in which the cluster is capable of increasing the number of nodes as the demand for service response increases and decrease the number of nodes as the requirement goes down.

Kubernetes implements two major autoscalers: Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA). Both help ensure that applications have the resources they need to meet demand while avoiding over-provisioning and reducing costs.

There are three types of autoscaling available in kubernetes:

  1. Cluster autoscaler
  2. Horizontal pod autoscaler
  3. Verticale pod autoscaler

Now lets talk about each autoscaler one by one

Cluster autoscaler. Adjusts the number of nodes in a cluster. The Cluster Autoscaler automatically adds or removes nodes in a cluster when nodes have insufficient resources to run a pod (adds a node) or when a node remains underutilized, and its pods can be assigned to another node (removes a node).

Horizontal pod autoscaler: HPA is a Kubernetes component that automatically updates workload resources such as Deployments and StatefulSets, scaling them to match demand for applications in the cluster. Horizontal scaling means deploying more pods in response to increased load.


When load decreases and the number of pods exceeds the configured minimum, HPA notifies the workload resource, for example the Deployment object, to scale down

  • It targets the my-app Deployment object and scales it between 2 and 10 replicas, depending on load.
  • Load is measured by CPU utilization. HPA will add or remove pods until the average pod in the deployment utilizes 75% of CPU on its node. If the average utilization is higher, it will add pods, and if it is lower than 75%, it will scale down pods.

Note: You can also use other metrics, such as memory utilization, instead of CPU. You can also define several metrics to represent application load, and the HPA algorithm adjusts the number of pods to satisfy the most demanding metric across all deployments.

HPA is very powerful feature available in kubernetes but it is not suitable for most of the use case and does not solve all the cluster resource problems. Few examples below

  • HPA cannot be used together with VPA on the same metrics. However you can combine them by using custom metrics for HPA.
  • HPA is only suitable for stateless applications that support parallel execution, or StatefulSets that provide persistence for stateful applications.
  • HPA can detect under-utilization at the node level, but not at the container level.
    Example: If containers running within a pod have unused requested resources, HPA cannot detect this, and you will need third-party tooling to identify this type of wasted resources.
  • The algorithm used in HPA does not take network bandwidth and storage into account when it comes to scaling.

Note: Horizontal pod autoscaling does not apply to objects that can’t be scaled (for example: a DaemonSet.)

Vertical Pod Autoscaler (VPA): Kubernetes Vertical Pod Autoscaler (VPA) is a component/tool you install in your cluster. It increases and decreases container CPU and memory resource configuration to align cluster resource allotment based on actual usage.

With VPA, there are two different types of resource configurations that we can manage on each container of a pod:

  1. Requests
  2. Limits

Request: Requests define the minimum amount of resources that containers need. For example, an application can use more than 256MB of memory, but Kubernetes will guarantee a minimum of 256MB to the container if its request is 256MB of memory.

Limits: Limits define the maximum amount of resources that a given container can consume. Your application might require at least 256MB of memory, but you might want to ensure that it doesn’t consume more than 512MB of memory, i.e., to limit its memory consumption to 512MB

What are the main component of VPA:

VPA deployment has three main components:

  1. VPA Recommender
  2. VPA Updater
  3. VPA Admission Controller

VPA Recommender:
Monitors resource utilization and computes target values.
Looks at the metric history, OOM events, and the VPA deployment spec and suggests fair requests. The limits are raised/lowered based on the limits-requests proportion defined.

VPA Updater:

Evicts those pods that need the new resource limits.

Implements whatever the Recommender recommends if “updateMode: Auto“ is defined.

The VPA Admission Controller:

Changes the CPU and memory settings (using a webhook) before a new pod starts whenever the VPA Updater evicts and restarts a pod.

Evicts a pod if it needs to change the pod’s resource requests when the Vertical Pod Autoscaler is set with an updateMode of “Auto.” Due to the design of Kubernetes, the only way to modify the resource requests of a running pod is to recreate the pod

VPA Diagram

There are four modes in which VPA operates:

  • “Auto”: VPA assigns resource requests on pod creation as well as updates them on existing pods using the preferred update mechanism.
  • “Recreate”: VPA assigns resource requests on pod creation as well as updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation (respecting the Pod Disruption Budget, if defined). This mode should be used rarely, only if you need to ensure that the pods are restarted whenever the resource request changes. Otherwise, prefer the “Auto” mode which may take advantage of restart-free updates once they are available.
  • “Initial”: VPA only assigns resource requests on pod creation and never changes them later.
  • “Off”: VPA does not automatically change the resource requirements of the pods. The recommendations are calculated and can be inspected in the VPA object.

Note: We can also use VPA in recommendation mode, where VPA Recommender will update the status field of the workload’s Vertical Pod Autoscaler resource with its suggested values but it will not modify the pod API request.

Conclusion: Kubernetes’s autoscaling mechanism allows our clusters to respond automatically to changes in resource demand. Kubernetes HPA adjusts the cluster horizontally by adjusting the cluster size to have the appropriate number of pods based on configuration thresholds. In contrast, VPA adjusts the pods vertically by adding or removing memory or CPU capacity.
HPA and VPA are essential techniques for efficiently managing resources in a Kubernetes cluster. HPA and VPA can help you achieve more efficient resource usage, reduce costs and improve the performance and scalability of your applications.

Reference: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/


Happy learning….



Kamlesh Prajapati

DevOps Practitioner (CKA certified , RHOCP Certified, Azure Certified on az-104,az-400,az-303.)