Amazon EKS supports multiple autoscaling mechanisms that adjust pod count, pod resources, and cluster node capacity based on real-time metrics. The four main tools – Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), Cluster Autoscaler, and Karpenter – work at different layers to keep workloads right-sized and infrastructure costs under control.
This guide covers installing and configuring each autoscaling component on an EKS cluster. We walk through Metrics Server setup, HPA and VPA configuration, Cluster Autoscaler deployment, Karpenter as a modern node provisioner, and custom metrics with Prometheus Adapter. Every step includes working kubectl commands and YAML manifests tested against EKS.
Prerequisites
Before starting, confirm you have the following in place:
- A running Amazon EKS cluster (version 1.28 or later recommended)
kubectlconfigured and authenticated against your EKS cluster- AWS CLI v2 installed and configured with IAM permissions for EKS, EC2 Auto Scaling, and IAM role management
- Helm 3 installed for chart-based deployments
- An EKS node group with at least 2 nodes (t3.medium or larger) for testing autoscaling behavior
- IAM OIDC provider enabled on the cluster (required for IRSA-based service accounts)
Enable the OIDC provider if not already configured:
eksctl utils associate-iam-oidc-provider --cluster my-cluster --approve
Step 1: Install Metrics Server
The Metrics Server collects CPU and memory usage data from kubelets and exposes it through the Kubernetes metrics API. Both HPA and VPA depend on this data to make scaling decisions. Without Metrics Server, autoscaling based on resource utilization will not function.
Deploy Metrics Server using the official manifest:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Wait for the deployment to become ready:
kubectl wait --for=condition=available deployment/metrics-server -n kube-system --timeout=120s
The command returns when the Metrics Server pod is running and ready to serve requests:
deployment.apps/metrics-server condition met
Verify that node metrics are being collected:
kubectl top nodes
You should see CPU and memory usage for each node in the cluster:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-10-0-1-45.eu-west-1.compute.internal 128m 6% 1024Mi 28%
ip-10-0-2-67.eu-west-1.compute.internal 96m 4% 890Mi 24%
If kubectl top nodes returns an error about metrics not being available, wait 60 seconds and try again – the Metrics Server needs time to collect the first round of data from kubelets.
Step 2: Configure Horizontal Pod Autoscaler (HPA)
HPA automatically adjusts the number of pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed CPU utilization, memory usage, or custom metrics. It checks metrics every 15 seconds by default and scales up or down to maintain target utilization.
First, create a sample deployment to test autoscaling:
kubectl create deployment php-apache \
--image=registry.k8s.io/hpa-example \
--requests='cpu=200m' \
--limits='cpu=500m' \
--port=80
Expose the deployment as a service:
kubectl expose deployment php-apache --port=80 --type=ClusterIP
Create an HPA that targets 50% CPU utilization and scales between 1 and 10 replicas:
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
Check the HPA status to confirm it is active and reading metrics:
kubectl get hpa php-apache
The output shows the current CPU utilization, target, and replica count:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 30s
For more control, define the HPA with a YAML manifest that includes both CPU and memory targets. Create the file:
sudo vi hpa-advanced.yaml
Add the following HPA specification with scaling behavior policies:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache-advanced
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
The behavior section controls scaling speed. Scale-up allows doubling pods every 60 seconds for fast response to traffic spikes. Scale-down is conservative – only 10% reduction per minute with a 5-minute stabilization window to prevent flapping.
Apply the manifest:
kubectl apply -f hpa-advanced.yaml
Step 3: Test HPA with Load Generation
Generate artificial CPU load against the php-apache service to trigger HPA scaling. Run a load generator pod in a separate terminal:
kubectl run load-generator --image=busybox:1.36 --restart=Never -- \
/bin/sh -c "while true; do wget -q -O- http://php-apache; done"
Watch the HPA respond to the increased CPU utilization:
kubectl get hpa php-apache --watch
Within 1-2 minutes, the CPU target percentage rises above 50% and HPA begins adding replicas:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 2m
php-apache Deployment/php-apache 148%/50% 1 10 1 2m30s
php-apache Deployment/php-apache 148%/50% 1 10 4 3m
php-apache Deployment/php-apache 62%/50% 1 10 7 4m
Stop the load generator when done testing:
kubectl delete pod load-generator
After load stops, HPA scales pods back down to the minimum over the stabilization window period (default 5 minutes for scale-down).
Step 4: Configure Vertical Pod Autoscaler (VPA)
VPA adjusts CPU and memory requests for individual pods based on historical usage patterns. Instead of adding more replicas (HPA), VPA right-sizes each pod. This is useful for workloads with variable resource needs – databases, batch jobs, and applications that are hard to scale horizontally.
Install VPA using the official repository:
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
Verify that all three VPA components are running:
kubectl get pods -n kube-system | grep vpa
You should see the admission controller, recommender, and updater pods all in Running state:
vpa-admission-controller-6b9d45c5f7-x2k4m 1/1 Running 0 45s
vpa-recommender-7c8b5d6f99-n8j2l 1/1 Running 0 45s
vpa-updater-5d4c8b7f68-p3m9r 1/1 Running 0 45s
Create a VPA policy for a deployment. Open the file:
vi vpa-policy.yaml
Add the following VPA configuration that sets resource boundaries and update mode:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: php-apache-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: hpa-example
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2000m
memory: 2Gi
controlledResources: ["cpu", "memory"]
The updateMode has three options:
- Off – VPA only provides recommendations, does not apply changes (good for initial observation)
- Auto – VPA evicts and recreates pods with updated resource requests
- Initial – VPA sets resources only at pod creation, never updates running pods
Apply the VPA policy:
kubectl apply -f vpa-policy.yaml
Check VPA recommendations after a few minutes of pod activity:
kubectl describe vpa php-apache-vpa
The recommendation section shows suggested CPU and memory values based on observed usage:
Recommendation:
Container Recommendations:
Container Name: hpa-example
Lower Bound:
Cpu: 25m
Memory: 52428800
Target:
Cpu: 100m
Memory: 104857600
Upper Bound:
Cpu: 400m
Memory: 209715200
Do not run HPA and VPA on the same metric (CPU or memory) for the same deployment. They will conflict. Use HPA for scaling replica count based on CPU, and VPA for right-sizing memory requests – or use VPA in “Off” mode alongside HPA to get recommendations without automated changes.
Step 5: Install Cluster Autoscaler
Cluster Autoscaler adds or removes EC2 nodes from your EKS node group when pods cannot be scheduled due to insufficient resources, or when nodes are underutilized. It works with EC2 Auto Scaling Groups that back your managed or self-managed node groups.
Create an IAM policy that grants Cluster Autoscaler the permissions it needs. Save the policy document:
vi cluster-autoscaler-policy.json
Add the following IAM policy that allows describing and modifying Auto Scaling Groups and EC2 instances:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeScalingActivities",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeInstanceTypes",
"ec2:DescribeImages",
"ec2:GetInstanceTypesFromInstanceRequirements",
"eks:DescribeNodegroup"
],
"Resource": "*"
}
]
}
Create the IAM policy and note the ARN returned:
aws iam create-policy \
--policy-name AmazonEKSClusterAutoscalerPolicy \
--policy-document file://cluster-autoscaler-policy.json
Create a Kubernetes service account with the IAM role attached using IRSA (IAM Roles for Service Accounts). Replace the account ID and cluster name with your values:
eksctl create iamserviceaccount \
--cluster=my-cluster \
--namespace=kube-system \
--name=cluster-autoscaler \
--attach-policy-arn=arn:aws:iam::111122223333:policy/AmazonEKSClusterAutoscalerPolicy \
--override-existing-serviceaccounts \
--approve
Step 6: Configure and Deploy Cluster Autoscaler
Deploy Cluster Autoscaler using the official Helm chart. Add the autoscaler Helm repository:
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm repo update
Install Cluster Autoscaler with EKS-specific configuration. Replace my-cluster and eu-west-1 with your cluster name and region:
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--namespace kube-system \
--set autoDiscovery.clusterName=my-cluster \
--set awsRegion=eu-west-1 \
--set rbac.serviceAccount.create=false \
--set rbac.serviceAccount.name=cluster-autoscaler \
--set extraArgs.balance-similar-node-groups=true \
--set extraArgs.skip-nodes-with-system-pods=false \
--set extraArgs.expander=least-waste \
--set extraArgs.scale-down-delay-after-add=5m \
--set extraArgs.scale-down-unneeded-time=5m
Key configuration flags explained:
- balance-similar-node-groups – distributes nodes evenly across node groups with similar instance types for better AZ balance
- expander=least-waste – selects the node group that will have the least idle resources after scale-up
- scale-down-delay-after-add – waits 5 minutes after a scale-up before considering scale-down, preventing rapid oscillation
- scale-down-unneeded-time – a node must be underutilized for 5 continuous minutes before removal
Verify the Cluster Autoscaler pod is running:
kubectl get pods -n kube-system -l app.kubernetes.io/name=aws-cluster-autoscaler
Check the autoscaler logs to confirm it discovered your node groups:
kubectl logs -n kube-system -l app.kubernetes.io/name=aws-cluster-autoscaler --tail=20
The logs should show the autoscaler discovering your Auto Scaling Groups and starting to monitor them for pending pods.
Tag your EKS node group Auto Scaling Groups so Cluster Autoscaler can discover them automatically. These tags are required:
aws autoscaling create-or-update-tags --tags \
ResourceId=eks-nodegroup-xxxxxxxx,ResourceType=auto-scaling-group,Key=k8s.io/cluster-autoscaler/my-cluster,Value=owned,PropagateAtLaunch=true \
ResourceId=eks-nodegroup-xxxxxxxx,ResourceType=auto-scaling-group,Key=k8s.io/cluster-autoscaler/enabled,Value=true,PropagateAtLaunch=true
If you created the node group with eksctl, these tags are already applied. You can verify with:
aws autoscaling describe-auto-scaling-groups \
--query "AutoScalingGroups[?Tags[?Key=='k8s.io/cluster-autoscaler/enabled']].AutoScalingGroupName" \
--output text
Step 7: Install Karpenter (Modern Alternative)
Karpenter is a newer node provisioning tool built by AWS that replaces Cluster Autoscaler with a faster, more flexible approach. Instead of relying on pre-defined Auto Scaling Groups, Karpenter directly provisions EC2 instances with the right instance type and size for pending pods. It responds to unschedulable pods in seconds rather than minutes.
Set up environment variables for the Karpenter installation. Replace these with your actual values:
export KARPENTER_NAMESPACE="kube-system"
export KARPENTER_VERSION="1.1.1"
export CLUSTER_NAME="my-cluster"
export AWS_DEFAULT_REGION="eu-west-1"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export TEMPOUT="$(mktemp)"
Create the Karpenter IAM roles and instance profile using the CloudFormation template provided by the project:
curl -fsSL "https://raw.githubusercontent.com/aws/karpenter-provider-aws/v${KARPENTER_VERSION}/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml" > "${TEMPOUT}" \
&& aws cloudformation deploy \
--stack-name "Karpenter-${CLUSTER_NAME}" \
--template-file "${TEMPOUT}" \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides "ClusterName=${CLUSTER_NAME}"
Create the IRSA service account for Karpenter:
eksctl create iamserviceaccount \
--cluster="${CLUSTER_NAME}" \
--name=karpenter \
--namespace="${KARPENTER_NAMESPACE}" \
--role-name="${CLUSTER_NAME}-karpenter" \
--attach-policy-arn="arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}" \
--override-existing-serviceaccounts \
--approve
Install Karpenter via Helm:
helm install karpenter oci://public.ecr.aws/karpenter/karpenter \
--version "${KARPENTER_VERSION}" \
--namespace "${KARPENTER_NAMESPACE}" \
--set "settings.clusterName=${CLUSTER_NAME}" \
--set "settings.interruptionQueue=${CLUSTER_NAME}" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi \
--wait
Verify Karpenter is running:
kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter
Both the Karpenter controller pod should show Running status:
NAME READY STATUS RESTARTS AGE
karpenter-6f4b8d9c7f-x8k2n 1/1 Running 0 60s
Create a NodePool that defines what instance types Karpenter can provision. Open the file:
vi karpenter-nodepool.yaml
Add the following NodePool configuration:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand", "spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["5"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
limits:
cpu: "100"
memory: 400Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 1m
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiSelectorTerms:
- alias: al2023@latest
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
role: "KarpenterNodeRole-my-cluster"
This configuration allows Karpenter to select from compute-optimized (c), general-purpose (m), and memory-optimized (r) instance families, generation 6 and above. It supports both On-Demand and Spot instances, with a cluster-wide limit of 100 vCPUs and 400 GiB memory. The consolidation policy automatically replaces underutilized nodes with smaller ones.
Apply the NodePool:
kubectl apply -f karpenter-nodepool.yaml
Step 8: Custom Metrics with Prometheus Adapter
HPA supports custom metrics beyond CPU and memory. Using Prometheus with the Prometheus Adapter, you can autoscale based on application-specific metrics like HTTP requests per second, queue depth, or active connections.
This step assumes you already have Prometheus running in your cluster. If not, deploy it first using the kube-prometheus-stack Helm chart.
Install the Prometheus Adapter:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Create a values file for the adapter. Open it:
vi prometheus-adapter-values.yaml
Add the following configuration that maps Prometheus metrics to Kubernetes custom metrics API:
prometheus:
url: http://kube-prometheus-stack-prometheus.monitoring.svc
port: 9090
rules:
default: false
custom:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
- seriesQuery: 'nginx_connections_active{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
as: "nginx_active_connections"
metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>})'
The first rule converts the http_requests_total counter into a http_requests_per_second rate metric. The second rule exposes active Nginx connections as a custom metric. Both become available through the Kubernetes custom metrics API.
Install the adapter with the custom values:
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--values prometheus-adapter-values.yaml
Verify the custom metrics are registered:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | python3 -m json.tool | head -30
Create an HPA that uses the custom requests-per-second metric. Open the file:
vi hpa-custom-metrics.yaml
Add the following HPA specification targeting 100 requests per second per pod:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 2
maxReplicas: 30
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
This HPA scales on both custom metrics (HTTP request rate) and built-in CPU metrics. The highest scaling recommendation from either metric wins – if request rate demands 8 replicas but CPU only needs 4, HPA scales to 8.
Apply the custom metrics HPA:
kubectl apply -f hpa-custom-metrics.yaml
Step 9: EKS Autoscaling Best Practices
After setting up the autoscaling components, follow these operational practices to keep scaling reliable and cost-effective.
Set resource requests on every pod
HPA, VPA, and node autoscalers all depend on resource requests to make decisions. Pods without CPU and memory requests are invisible to the scheduler’s resource accounting. Always define both requests and limits:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
Use Pod Disruption Budgets
When Cluster Autoscaler or Karpenter removes nodes, Pod Disruption Budgets (PDBs) prevent too many pods from going down simultaneously. Create a PDB for every production deployment:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: webapp-pdb
spec:
minAvailable: "50%"
selector:
matchLabels:
app: webapp
Separate Karpenter and Cluster Autoscaler
Do not run Karpenter and Cluster Autoscaler on the same node groups. If you are migrating from Cluster Autoscaler to Karpenter, keep a small managed node group for system components (CoreDNS, kube-proxy, Karpenter itself) managed by Cluster Autoscaler, and let Karpenter handle application workload nodes.
Monitor autoscaling events
Set up CloudWatch logging for your EKS cluster and monitor autoscaling events. Watch for common issues like:
- FailedScaleUp – Cluster Autoscaler could not add nodes, usually due to EC2 capacity limits or IAM permissions
- ScaleDownDisabled – annotations or PDBs are blocking node removal
- FailedGetResourceMetric – HPA cannot read metrics from Metrics Server, check the metrics-server pods
Use this command to check recent autoscaling events across the cluster:
kubectl get events --field-selector reason=ScalingReplicaSet --sort-by='.lastTimestamp' -A
Use Spot instances for non-critical workloads
Both Karpenter and Cluster Autoscaler support Spot instances. Use Spot for batch processing, development environments, and stateless web frontends. Keep databases, stateful workloads, and system components on On-Demand instances. With Karpenter, the NodePool configuration above already includes Spot in the capacity-type values.
Right-size before autoscaling
Run VPA in “Off” mode for a week on existing workloads before enabling HPA or auto-mode VPA. Review the recommendations to set accurate baseline resource requests. Autoscaling on top of poorly sized pods wastes resources and money.
Conclusion
We covered the four autoscaling layers in EKS – Metrics Server for resource data collection, HPA for horizontal pod scaling, VPA for vertical pod right-sizing, Cluster Autoscaler for EC2 node group management, and Karpenter as a faster node provisioner. Custom metrics with Prometheus Adapter extend HPA beyond basic CPU and memory to application-specific metrics like request rates.
For production clusters, combine HPA with either Cluster Autoscaler or Karpenter (not both on the same node groups), set resource requests on every pod, and use Pod Disruption Budgets to maintain availability during scale-down events. Monitor autoscaling behavior through CloudWatch and Kubernetes events to catch misconfigurations early.