The default Horizontal Pod Autoscaler has several shortcomings which can be listed as follows:
- The scaling is not restricted in terms of how pods are started and stopped at the same time.
- The scaling action is not based on historical data but on current usage.
- The thresholds for scaling up and scaling down are the same.
Create a scaler object in the following format:
apiVersion: arjunnaik.in/v1alpha1
kind: Scaler
metadata:
name: example-scaler
namespace: default
spec:
evaluations: 2 // Number of evaluations in before scaling happens
minReplicas: 1 // Minimum number of replicas
maxReplicas: 10 // Maximum number of replicas
scaleUp: 50 // Scale Up threshold in utilization percentage
scaleDown: 20 // Scale Down threshold in utilization percentage
scaleUpSize: 2 // Number of pods to scale up
scaleDownSize: 1 // Number of pods to scale down
target:
kind: Deployment
name: nginx
apiVersion: apps/v1
In the above example the target
field contains the scaling target. In this case the target is a Deployment with
the name nginx
. Evaluations indicates the number of minutes (cycles) before scaling happens. In this example,
if the CPU utilization of a pod is more than 50% for more than 2 minutes then the deployment is scaled up. The
scaleUpSize
and scaleDownSize
indicates the number of pods to be increased on successful scale up or scale down
evaluations.
This setup expects Prometheus to be running in the cluster and configured to scrape pod resource metrics. The address
for Prometheus can be passed through -prometheus-url
flag.