Kubernetes – Autoscaling

Last Updated : 17 Mar, 2023

Pre-requisite: Kubernetes

Life before Kubernetes is like writing our code and pushing the code into physical servers in a data center and managing the resources needed by that server to run our application smoothly and another type is deploying our code in virtual machines(VM). With VMs also have problems with hardware and software components required by VMs costs are high and there are some security risks with VMs. Here comes the role of Kubernetes. It is an open-source platform that allows users to manage, deploy and maintain a group of containers and it is like a tool that manages multiple docker environments together. The problems we faced in VMs can be overcome by Kubernetes(K8s).

Kubernetes Autoscaling

The main point of the cloud and Kubernetes is the ability to scale in the way that we can be able to add new nodes if the existing ones get full and at the same if the demand drops we should be able to delete those nodes. To solve this problem we can use Kubernetes auto scaler which is a component that allows us to scale the resources up and down according to the usage this method is called Kubernetes autoscaling. There are three different methods of Kubernetes autoscaling:

Horizontal Pod Autoscaler (HPA)
Vertical Pod Autoscaler (VPA)
Cluster Autoscaler (CA)

Kubernetes Horizontal Pod Autoscaler

Horizontal Pod Autoscaler(HPA) is a controller that can scale most of the pod-based resources up and down based on your application workload. It does this by scaling the number of replicas of your pod once certain preconfigured thresholds are met and for the many applications we deploy scaling mostly depends on only a single metric which is CPU usage. To use HPA we need to define the number of maximum and minimum pods that we want to use for a particular application and also the memory percentage. If HPA is successfully enabled for a particular application Kubernetes will automatically monitor and controls the scaling up and down of pods based on the minimum and maximum limit we have defined.

For example, we will consider an application like Airbnb that runs in Kubernetes and it experiences high traffic of users if there is any offer on booking hotels and flights if the application is not optimized for handling this traffic, users may experience slow response times or even downtime. By using HPA, you may specify a target CPU usage percentage, a minimum and a maximum number of running pods, and other parameters. Kubernetes will automatically increase the number of pods to manage the increasing traffic when the CPU utilization reaches the specified level.

YAML code for HPA:

apiVersion: autoscaling/v2    
#this specifies Kubernetes API Version 
kind: HorizontalPodAutoscaler   
# this specifies Kubernetes object like HPA or VPA 
metadata:
 name: name_of_app   
spec:
 scaleTargetRef:
   apiVersion: apps/v2
   kind: Deployment
   name: name_of_app
 minReplicas: 1
 maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 40
  - type: Resource
    resource:
     name: memory
     target:
      type: Utilization
      averageUtilization: 40

The last line ‘targetCPUUtilizationPercentage’ specifies the target CPU utilization percentage that the HPA will aim for when scaling the deployment. In this case, it is set to 50%, meaning that the HPA will attempt to keep the CPU utilization of the deployment at or below 50%. This YAML code will automatically scale the specified deployment based on CPU Utilization with a minimum of 1 and a maximum of 10 replicas. If the average CPU utilization of the container exceeds 50%, the HPA will automatically scale up the deployment to maintain optimal performance

Kubernetes Vertical Pod Autoscaler

The Vertical Pod Autoscaler (VPA) for Kubernetes is a tool that provides automated CPU and memory requests and limits modifications based on past resource utilization metrics. It may assist you in effectively and automatically allocating resources inside a Kubernetes cluster, down to the level of individual containers, when utilized appropriately. In addition to enhancing a pod’s performance and efficiency by managing its resource demands and limits, VPA may lower the cost of maintaining the application by reducing the wastage of resources. Pod resource use in a Kubernetes cluster may be improved using VPA, a useful feature.

The VPA deployment has three components namely:

VPA Admission Controller
VPA Recommender
VPA Updater

VPA Admission Controller

It is a component that makes sure that before it is created or changed in the cluster, any new or updated Pod spec complies with the VPA criteria. All the pod creation or update requests are monitored by the VPA Admission Controller, who then applies a set of rules to the pod specifications. These rules are set up in accordance with the active VPA policy. The Kubernetes resources that are being created or altered must agree to the VPA policy, which is another check performed by the VPA Admission Controller.

VPA Recommender

It is a component in Kubernetes that is based on the resource utilization of those containers over time and suggests resource requests and limitations for specific containers in a pod. The Kubernetes Metrics Server, which offers real-time resource usage analytics for all containers running in the Kubernetes cluster, provides data on resource consumption to the VPA Recommender. The VPA Recommender creates suggestions for resource requests and restrictions for each container in a pod based on this data. It considers the factors like past usage, limits, and pod requirements while generating the recommendations.

VPA Updater

It is a component that modifies the resource usage and limitations for each container in a pod using the changes made by the VPA Recommender. The VPA Updater updates the Pod standard with the recommended resource requests and restrictions by continuously monitoring the suggestions made by the VPA Recommender. Using the Kubernetes API server, the changes are applied to the Pod standard. Moreover, the VPA Updater makes sure that the updated resource requests and restrictions correspond to the existing VPA policy. The VPA Updater will reject the update and stop the pod from being updated if the new values do not satisfy the VPA policy’s requirements.

YAML file for VPA:
apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
 name: example-vpa
spec:
 targetRef:
   apiVersion: "apps/v2"
   kind: Deployment
   name: example_deployment
 updatePolicy:
   updateMode: "Auto"
 resourcePolicy:
   containerPolicies:
   - containerName: example_container
     minAllowed:
       cpu: 110m
       memory: 150Mi
     maxAllowed:
       cpu: 500m
       memory: 1Gi
     mode: "Auto"

Here the ‘resource policy’ specifies the resource policies that the VPA should use. In this case, there is only one container policy specified for a container named “example_container”. The minAllowed and maxAllowed fields specify the minimum and maximum allowed resource requests and limits, respectively. Here the mode is set to “Auto”, which means that the VPA will automatically adjust the resource requests and limits of the container within the specified range.

Kubernetes Cluster Autoscaler

The cluster autoscaler is a tool that acts according to the requirements of your workloads, cluster autoscaler dynamically changes the number of nodes in a certain node pool. The cluster autoscaler scales back down to a minimum size that you choose when demand is low. This can increase the availability of your workload when you needs it. We don’t need to manually add or remove the nodes instead we can set a limit of maximum and minimum size for the node pool and the rest is taken care of by cluster autoscaler.

For example, A replica’s Pod could be rescheduled onto a new node if its current node is removed, for instance, if your workload comprises a controller with a single replica. Design your workloads to endure unexpected interruptions or make sure that crucial Pods are not disrupted before activating cluster autoscaler. Scaling choices are not made by CA based on CPU or memory use. It just looks at a pod’s requests and allotted amounts of CPU and memory. Due to this limitation, CA will not be able to identify any unused computing resources requested by users, creating a cluster with inefficient use and waste. The Cluster autoscaler eliminates nodes to the minimal size of the node pool if nodes are underutilized and all Pods can be scheduled even with fewer nodes in the node pool. Cluster autoscaler won’t try to scale down a node if there are Pods on it that can’t relocate to other nodes in the cluster. Cluster autoscaler does not address resource shortages on nodes if pods have requested insufficient amounts of resources (or) have left the defaults in place, which may be insufficient. By explicitly requesting resources for each job, you may ensure that the cluster autoscaler operates as correctly as possible.

YAML for cluster autoscaling:

apiVersion: autoscaling/v2
kind: ClusterAutoscaler
metadata:
 name: cluster_autoscaler
spec:
 scaleTargetRef:
   apiVersion: apps/v2
   kind: Deployment
   name: cluster-autoscaler
 minReplicas: 1
 maxReplicas: 8
 autoDiscovery:
   clusterName: my_kubernetes_cluster
   tags:
     k8s.io/cluster_autoscaler/enabled: "true"
 balanceSimilarNodeGroups: true

The Cluster Autoscaler’s auto-discovery section described in this is the name of the Kubernetes cluster in which the Cluster Autoscaler is currently executing is specified in this example’s clusterName parameter. The tags field instructs the Cluster Autoscaler to scale the cluster by nodes that have the tag “k8s.io/cluster-autoscaler/enabled” set to “true”.And the ‘balanceSimilarNodeGroups’ section field specifies whether the Cluster Autoscaler should attempt to balance similar node groups when scaling the cluster.

Suggest improvement

Node Affinity in Kubernetes

What is AWS Wavelength?

Share your thoughts in the comments