Auto-Scale Pods Based on Resource Utilization

In this lesson, we will see Auto Scaling of Pods based on Resource Utilization in action.

We'll cover the following

Auto-scale based on resource usage
- Create HPA
  - Resource utilization not getting shown
- Create HPA with new definition
  - Resource utilization getting shown
- Actual memory usage above the target value
  - HPA continue to scale up the Deployment
Auto descale based on resource usage

Auto-scale based on resource usage #

So far, the HPA has not yet performed auto-scaling based on resource usage. Let’s do that now. First, we’ll try to create another HorizontalPodAutoscaler but, this time, we’ll target the StatefulSet that runs our MongoDB. So, let’s take a look at yet another YAML definition.

Create `HPA` #

cat scaling/go-demo-5-db-hpa.yml

The output is as follows.

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: db
  namespace: go-demo-5
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: db
  minReplicas: 3
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 80
  - type: Resource
    resource:
      name: memory
      targetAverageUtilization: 80

That definition is almost the same as the one we used before. The only difference is that this time we’re targeting StatefulSet called db and that the minimum number of replicas should be 3.

Let’s apply it.

kubectl apply \
    -f scaling/go-demo-5-db-hpa.yml \
    --record

Let’s take another look at the HorizontalPodAutoscaler resources.

kubectl -n go-demo-5 get hpa

The output is as follows.

NAME REFERENCE      TARGETS                      MINPODS MAXPODS REPLICAS AGE
api  Deployment/api 41%/80%, 0%/80%              2       5       2        5m
db   StatefulSet/db <unknown>/80%, <unknown>/80% 3       5       0        20s

We can see that the second HPA was created and that the current utilization is unknown. That must be a similar situation as before. Should we give it some time for data to start flowing in? Wait for a few moments and retrieve HPAs again. Are the targets still unknown?

Resource utilization not getting shown #

There might be something wrong since the resource utilization continued being unknown. Let’s describe the newly created HPA and see whether we’ll be able to find the cause behind the issue.

kubectl -n go-demo-5 describe hpa db

The output, limited to the event messages, is as follows.

...
Events:
... Message
... -------
... New size: 3; reason: Current number of replicas below Spec.MinReplicas
... missing request for memory on container db-sidecar in pod go-demo-5/db-0
... failed to get memory utilization: missing request for memory on container db-sidecar in pod go-demo-5/db-0

Please note that your output could have only one event or even none of those. If that’s the case, please wait for a few minutes and repeat the previous command.

If we focus on the first message, we can see that it started well. HPA detected that the current number of replicas is below the limit and increased them to three. That is the expected behavior, so let’s move to the other two messages.

HPA could not calculate the percentage because we did not specify how much memory we are requesting for the db-sidecar container. Without requests, HPA cannot calculate the percentage of the actual memory usage. In other words, we missed specifying resources for the db-sidecar container and HPA could not do its work. We’ll fix that by applying go-demo-5-no-hpa.yml.

Create `HPA` with new definition #

Let’s take a quick look at the new definition.

cat scaling/go-demo-5-no-hpa.yml

The output, limited to the relevant parts, is as follows.

...
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: db
  namespace: go-demo-5
spec:
  ...
  template:
    ...
    spec:
      ...
      - name: db-sidecar
        ...
        resources:
          limits:
            memory: "100Mi"
            cpu: 0.2
          requests:
            memory: "50Mi"
            cpu: 0.1
...

The only noticeable difference, when compared with the initial definition, is that this time we defined the resources for the db-sidecar container. Let’s apply it.

kubectl apply \
    -f scaling/go-demo-5-no-hpa.yml \
    --record

Next, we’ll wait for a few moments for the changes to take effect, before we retrieve the HPAs again.

kubectl -n go-demo-5 get hpa

This time, the output is more promising.

NAME REFERENCE      TARGETS          MINPODS MAXPODS REPLICAS AGE
api  Deployment/api 66%/80%, 10%/80% 2       5       2        16m
db   StatefulSet/db 60%/80%, 4%/80%  3       5       3        10m

Resource utilization getting shown #

Both HPAs are showing the current and target resource usage. Neither reached the target values, so HPA is maintaining the minimum number of replicas. We can confirm that by listing all the Pods in the go-demo-5 Namespace.

kubectl -n go-demo-5 get pods

The output is as follows.

NAME    READY STATUS  RESTARTS AGE
api-... 1/1   Running 0        42m
api-... 1/1   Running 0        46m
db-0    2/2   Running 0        33m
db-1    2/2   Running 0        33m
db-2    2/2   Running 0        33m

We can see that there are two Pods for the api Deployment and three replicas of the db StatefulSet. Those numbers are equivalent to the spec.minReplicas entries in the HPA definitions.

Get hands-on with 1400+ tech skills courses.

Before Getting Started

Autoscaling Deployments and StatefulSets

Auto-Scaling Nodes Of A Kubernetes Cluster

Collecting and Querying Metrics and Sending Alerts

Debugging Issues Discovered Through Metrics and Alerts

Extending HorizontalPodAutoscaler With Custom Metrics

Visualizing Metrics And Alerts

Collecting And Querying Logs

Conclusion

Auto-Scale Pods Based on Resource Utilization

Auto-scale based on resource usage #

Create `HPA` #

Resource utilization not getting shown #

Create `HPA` with new definition #

Resource utilization getting shown #

Before Getting Started

Autoscaling Deployments and StatefulSets

Auto-Scaling Nodes Of A Kubernetes Cluster

Collecting and Querying Metrics and Sending Alerts

Debugging Issues Discovered Through Metrics and Alerts

Extending HorizontalPodAutoscaler With Custom Metrics

Visualizing Metrics And Alerts

Collecting And Querying Logs

Conclusion

Auto-Scale Pods Based on Resource Utilization

Auto-scale based on resource usage #

Create HPA #

Resource utilization not getting shown #

Create HPA with new definition #

Resource utilization getting shown #

Create `HPA` #

Create `HPA` with new definition #