TRAFEX TRAFEX Consultancy Consultancy
The key components of Kubernetes autoscaling

The key components of Kubernetes autoscaling

September 5, 2021

Autoscaling is an important feature of Kubernetes. With this feature you always have enough resources for the workload, and when a node becomes unhealthy it gets replaced without effecting the workload. But you won’t get it automatically by just deploying your Pods on Kubernetes.

You need to provide the scheduler with information about your Pods, so it can make the right decisions when scheduling them.

A scheduler watches for newly created Pods that have no Node assigned. For every Pod that the scheduler discovers, the scheduler becomes responsible for finding the best Node for that Pod to run on.

The following components are needed to truly benefit from the autoscaling feature of Kubernetes;

  1. Resource Request
  2. Pod Disruption Budget
  3. Horizontal Pod Autoscaler
  4. Cluster Autoscaler

They work together as shown in the following diagram. Each component is explained in the next chapters.

Kubernetes Autoscaling

Resource Request

When you configure a Pod, you specify how much of each resource it needs. The most common resources to specify are CPU and memory, but there are others.

Per Pod you can specify;

  1. The amount of CPU & memory you expect this Pod needs; the request
  2. The amount of CPU & memory you’re allowing the Pod to use; the limit

The scheduler takes the resource request into account when determining which node has the resources available to run this Pod. When there is not a node available that would fit the Pod’s resource request, the Pod goes to the Pending state.

The Cluster Autoscaler will notice a Pod is pending because of a lack of resources and acts upon it by adding a new node.

Configuring the resource request

The resource request is configured per Pod like this;

resources:
  requests:
    cpu: "200m"
    memory: "128Mi"

To come up with sane values for CPU & memory you can take the following into account;

Pod Disruption Budget

Pod disruption budgets allow you to configure the amount of Pods that can be down simultaneously from voluntary disruptions. Voluntary disruptions are mostly triggered by the application owner or cluster administrator. This happens for example when a deployment is changed or a node is drained. The scheduler makes sure that when it’s evicting Pods, it keeps enough Pods running from the same deployment, statefulset or other controller to don’t exceed the Pod disruption budget.

The cluster autoscaler is performing cluster administrator actions like draining a node to scale the cluster down. That’s why it’s important to configure these correctly when you want the cluster to autoscale and autoheal.

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: myapp
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: myapp

Example of a Pod Disruption Budget that allows for 1 Pod to be unavailable at the same time.

Horizontal Pod Autoscaler

With a Horizontal Pod Autoscaler you specify which metrics decide if the amount of replica’s should scale up or down. You can use per-Pod resource metrics like CPU and memory or custom metrics like the number of requests/second the Pod is receiving.

Resource metrics can be defined as utilization value, e.g.;

metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 90
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 90

When you define the metric as utilization value it will be calculated as percentage of the configured resource request.

Let’s say you have a Pod with 1 CPU and 1 GiB memory configured as resource requests. With the Horizontal Pod Autoscaler configured as in the example it will scale up when the Pod is using 900m CPU or 900 MiB memory.

Cluster Autoscaler

The Cluster Autoscaler is the component that adjusts the size of the node pool so that all Pods have a place to run and there are no unneeded nodes.

On most public cloud providers it’s part of the control plane which is managed by the provider. For AWS that’s not the case, you need to deploy it yourself.

Adding a node

The Cluster Autoscaler will monitor the Pods and decide to add a node when a Pod needs to be scheduled and there aren’t sufficient resources for the resource request of that Pod.

This works as follows;

  1. A new Pod is created
  2. The scheduler reads the resource request of the Pod and decides if there are enough resources on one of the nodes.
  3. If there are, the Pod is assigned to the node.
  4. If there aren’t, the Pod is set to the Pending state and can’t start.
  5. The Cluster Autoscaler will detect a Pod is not able to schedule due to a lack of resources.
  6. The Cluster Autoscaler will determine if the Pod could be scheduled when a new node is added (it could be due to (anti-) affinity rules that the Pod still can’t schedule on the newly created node).
  7. If so, the Cluster Autoscaler will add a new node to the cluster.
  8. The scheduler will detect the new node and schedule the Pod on the new node.

It’s important to know that the scheduler is not capable of moving Pods to different nodes to make room for the new Pod. This can sometimes lead to inefficient use of resources.

Removing a node

The Cluster autoscaler will decide to remove a node when it has low utilization and all of its important Pods can be moved to other nodes. There are a few reasons which prevent a Pod to being moved to a different node. To move a Pod it needs to be evicted and a new one needs to be started on a different node.

Reasons why a Pod can’t be moved;

The logs of the Cluster Autoscaler can tell you the actual reason, but when the Cluster Autoscaler is managed by the cloud provider you don’t always have access to that log.

If you think a node could be removed and the Cluster Autoscaler is not acting on it, you could try to drain the node and see what output that gives. In some cases this will show the reason why the Cluster Autoscaler can’t remove it, for example when the Pod Disruption Budget doesn’t allow it.

Go back

Related content

Articles

Create a DB backup from a pod running MySQL on Kubernetes

When using the MySQL docker image you can easily create a DB dump with this one-line CLI command.

Read More

Articles

Doing maintenance & debugging on Kubernetes with a support pod

Using a support pod is a simple and secure way to take a look around from within the Kubernetes cluster without interfering with the workload or expose a security risk.

Read More