Amazon Elastic Kubernetes Service (EKS) is AWS’s managed Kubernetes platform that handles the control plane, etcd storage, and API server availability for you. Terraform (and its open-source fork OpenTofu) makes it possible to define your entire EKS infrastructure as code – VPC, subnets, IAM roles, node groups, and add-ons – in declarative configuration files that can be version-controlled and repeated across environments.

This guide walks through deploying a production-ready Amazon EKS cluster using Terraform on AWS. We will set up a custom VPC with public and private subnets, deploy EKS with managed node groups, configure IAM roles and OIDC, install essential add-ons (CoreDNS, kube-proxy, VPC CNI), set up the Cluster Autoscaler, deploy a test application, and finally tear everything down cleanly.

Prerequisites

Before starting, ensure you have the following in place:

  • An AWS account with billing enabled
  • An IAM user or role with permissions for EKS, EC2, VPC, IAM, CloudWatch Logs, and KMS (AdministratorAccess works for testing, but scope down for production)
  • AWS CLI v2 installed and configured with credentials
  • Terraform v1.6+ or OpenTofu v1.6+ installed on your workstation
  • kubectl v1.28+ installed
  • git for version control of your Terraform files
  • A Linux or macOS workstation (Windows with WSL2 works too)

Verify your AWS CLI is configured correctly by running:

$ aws sts get-caller-identity

You should see output showing your AWS account ID, user ARN, and user ID. If you get an error, run aws configure and provide your Access Key ID, Secret Access Key, and preferred region.

Confirm Terraform is installed:

$ terraform version
Terraform v1.9.x
on linux_amd64

If you prefer OpenTofu, replace terraform with tofu in all commands throughout this guide. The configuration syntax is identical.

Step 1: Create the Project Structure

Start by creating a directory for your EKS Terraform project. We will organize the configuration into separate files for readability.

$ mkdir eks-terraform && cd eks-terraform
$ touch main.tf variables.tf outputs.tf providers.tf terraform.tfvars

This gives us a clean structure where providers.tf handles provider configuration, variables.tf defines input variables, main.tf holds the core resources, outputs.tf defines what gets printed after apply, and terraform.tfvars sets variable values.

Step 2: Configure the Terraform AWS Provider

Open the providers file and add the required provider configuration.

$ vim providers.tf

Add the following content:

terraform {
  required_version = ">= 1.6.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.80"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.35"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "~> 4.0"
    }
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "terraform"
      Project     = "eks-cluster"
    }
  }
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}

The AWS provider version ~> 5.80 pins to the 5.x line while allowing patch updates. The Kubernetes provider is configured to authenticate through the AWS CLI, which is the recommended approach for EKS.

Step 3: Define Input Variables

Open the variables file and define all configurable parameters for the deployment.

$ vim variables.tf

Add these variable definitions:

variable "aws_region" {
  description = "AWS region for the EKS cluster"
  type        = string
  default     = "us-east-1"
}

variable "environment" {
  description = "Environment name (dev, staging, production)"
  type        = string
  default     = "dev"
}

variable "cluster_name" {
  description = "Name of the EKS cluster"
  type        = string
  default     = "my-eks-cluster"
}

variable "cluster_version" {
  description = "Kubernetes version for the EKS cluster"
  type        = string
  default     = "1.31"
}

variable "vpc_cidr" {
  description = "CIDR block for the VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "node_instance_types" {
  description = "EC2 instance types for the managed node group"
  type        = list(string)
  default     = ["t3.medium"]
}

variable "node_desired_size" {
  description = "Desired number of worker nodes"
  type        = number
  default     = 2
}

variable "node_min_size" {
  description = "Minimum number of worker nodes"
  type        = number
  default     = 1
}

variable "node_max_size" {
  description = "Maximum number of worker nodes"
  type        = number
  default     = 5
}

variable "node_disk_size" {
  description = "Disk size in GB for worker nodes"
  type        = number
  default     = 50
}

Now set the actual values in the tfvars file:

$ vim terraform.tfvars

Add your preferred settings:

aws_region          = "us-east-1"
environment         = "dev"
cluster_name        = "my-eks-cluster"
cluster_version     = "1.31"
vpc_cidr            = "10.0.0.0/16"
node_instance_types = ["t3.medium"]
node_desired_size   = 2
node_min_size       = 1
node_max_size       = 5
node_disk_size      = 50

Adjust the region, instance type, and node count to match your workload requirements. For production, consider using m5.large or larger instances with at least 3 nodes across multiple availability zones.

Step 4: Deploy the VPC with the Terraform AWS VPC Module

EKS requires a VPC with specific subnet tagging for load balancer integration and pod networking. The official terraform-aws-modules/vpc module handles this cleanly. Open the main configuration file.

$ vim main.tf

Start with the data source to get available AZs, then add the VPC module:

# Fetch availability zones in the selected region
data "aws_availability_zones" "available" {
  filter {
    name   = "opt-in-status"
    values = ["opt-in-not-required"]
  }
}

locals {
  azs = slice(data.aws_availability_zones.available.names, 0, 3)
}

# ------------------------------------------------------------------
# VPC - Public and Private subnets across 3 AZs
# ------------------------------------------------------------------
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.16"

  name = "${var.cluster_name}-vpc"
  cidr = var.vpc_cidr

  azs             = local.azs
  private_subnets = [for k, v in local.azs : cidrsubnet(var.vpc_cidr, 4, k)]
  public_subnets  = [for k, v in local.azs : cidrsubnet(var.vpc_cidr, 8, k + 48)]
  intra_subnets   = [for k, v in local.azs : cidrsubnet(var.vpc_cidr, 8, k + 52)]

  enable_nat_gateway   = true
  single_nat_gateway   = true  # Set to false for HA in production
  enable_dns_hostnames = true
  enable_dns_support   = true

  # Tags required for EKS subnet auto-discovery
  public_subnet_tags = {
    "kubernetes.io/role/elb"                      = 1
    "kubernetes.io/cluster/${var.cluster_name}"    = "owned"
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb"              = 1
    "kubernetes.io/cluster/${var.cluster_name}"    = "owned"
  }

  tags = {
    Environment = var.environment
  }
}

This creates a VPC with three private subnets (for worker nodes), three public subnets (for load balancers and NAT gateway), and three intra subnets (for the EKS control plane ENIs). The subnet tags are required so that the AWS Load Balancer Controller can automatically discover which subnets to place load balancers in. The single_nat_gateway = true setting saves cost in development – flip it to false for production to get one NAT gateway per AZ.

Step 5: Deploy the EKS Cluster with Terraform EKS Module

Add the EKS module configuration below the VPC block in main.tf. This module handles the cluster creation, IAM roles, OIDC provider, managed node groups, and add-ons in one block.

# ------------------------------------------------------------------
# EKS Cluster
# ------------------------------------------------------------------
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.31"

  cluster_name    = var.cluster_name
  cluster_version = var.cluster_version

  # Networking
  vpc_id                         = module.vpc.vpc_id
  subnet_ids                     = module.vpc.private_subnets
  control_plane_subnet_ids       = module.vpc.intra_subnets
  cluster_endpoint_public_access = true

  # Cluster access configuration
  enable_cluster_creator_admin_permissions = true

  # EKS Add-ons
  cluster_addons = {
    coredns = {
      most_recent = true
      configuration_values = jsonencode({
        computeType = "ec2"
      })
    }
    kube-proxy = {
      most_recent = true
    }
    vpc-cni = {
      most_recent    = true
      before_compute = true
      configuration_values = jsonencode({
        env = {
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
    eks-pod-identity-agent = {
      most_recent = true
    }
  }

  # Managed Node Group
  eks_managed_node_groups = {
    default = {
      name            = "${var.cluster_name}-ng"
      instance_types  = var.node_instance_types
      capacity_type   = "ON_DEMAND"

      min_size     = var.node_min_size
      max_size     = var.node_max_size
      desired_size = var.node_desired_size
      disk_size    = var.node_disk_size

      # Use Amazon Linux 2023 AMI
      ami_type = "AL2023_x86_64_STANDARD"

      labels = {
        Environment = var.environment
        NodeGroup   = "default"
      }

      tags = {
        "k8s.io/cluster-autoscaler/enabled"             = "true"
        "k8s.io/cluster-autoscaler/${var.cluster_name}"  = "owned"
      }
    }
  }

  tags = {
    Environment = var.environment
  }
}

Key decisions in this configuration:

  • EKS Add-ons managed by Terraform – CoreDNS, kube-proxy, and VPC CNI are installed as EKS managed add-ons rather than self-managed. This means AWS handles version compatibility and updates.
  • VPC CNI prefix delegation – The ENABLE_PREFIX_DELEGATION setting assigns /28 prefixes instead of individual IPs to ENIs, significantly increasing the number of pods each node can run.
  • EKS Pod Identity Agent – This is the newer replacement for IRSA (IAM Roles for Service Accounts) and simplifies how pods get AWS IAM permissions.
  • Amazon Linux 2023 – The AL2023_x86_64_STANDARD AMI type uses the latest Amazon Linux 2023 optimized for EKS.
  • Cluster Autoscaler tags – The node group is tagged so the Cluster Autoscaler can discover and manage it.

Step 6: Define Outputs

Add outputs so you can easily retrieve cluster connection details after deployment.

$ vim outputs.tf

Add the following output definitions:

output "cluster_name" {
  description = "EKS cluster name"
  value       = module.eks.cluster_name
}

output "cluster_endpoint" {
  description = "EKS cluster API endpoint"
  value       = module.eks.cluster_endpoint
}

output "cluster_version" {
  description = "EKS cluster Kubernetes version"
  value       = module.eks.cluster_version
}

output "cluster_arn" {
  description = "EKS cluster ARN"
  value       = module.eks.cluster_arn
}

output "cluster_certificate_authority_data" {
  description = "Base64 encoded certificate data for the cluster"
  value       = module.eks.cluster_certificate_authority_data
  sensitive   = true
}

output "oidc_provider_arn" {
  description = "ARN of the OIDC provider for IRSA"
  value       = module.eks.oidc_provider_arn
}

output "node_security_group_id" {
  description = "Security group ID attached to the EKS nodes"
  value       = module.eks.node_security_group_id
}

output "configure_kubectl" {
  description = "Command to configure kubectl"
  value       = "aws eks update-kubeconfig --region ${var.aws_region} --name ${module.eks.cluster_name}"
}

Step 7: Initialize and Deploy the EKS Cluster

With all configuration files in place, initialize Terraform to download the required providers and modules.

$ terraform init

You should see output confirming the providers and modules were downloaded:

Initializing the backend...
Initializing modules...
Downloading registry.terraform.io/terraform-aws-modules/eks/aws 20.31.6 for eks...
- eks in .terraform/modules/eks
Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 5.16.0 for vpc...
- vpc in .terraform/modules/vpc

Initializing provider plugins...
- Installing hashicorp/aws v5.80.0...
- Installing hashicorp/kubernetes v2.35.1...
- Installing hashicorp/tls v4.0.6...

Terraform has been successfully initialized!

Run a plan to review what Terraform will create:

$ terraform plan -out=eks.tfplan

The plan will show around 50-70 resources to be created, including the VPC, subnets, route tables, NAT gateway, EKS cluster, node group, IAM roles, and security groups. Review the plan output carefully, paying attention to the instance types, node counts, and subnet CIDRs.

Apply the plan to create all resources:

$ terraform apply eks.tfplan

The deployment takes 12-20 minutes. The EKS cluster control plane creation alone takes about 10 minutes. Once complete, you will see the outputs showing your cluster endpoint, name, and the kubectl configuration command.

Apply complete! Resources: 62 added, 0 changed, 0 destroyed.

Outputs:

cluster_arn = "arn:aws:eks:us-east-1:123456789012:cluster/my-eks-cluster"
cluster_endpoint = "https://ABCDEF1234567890.gr7.us-east-1.eks.amazonaws.com"
cluster_name = "my-eks-cluster"
cluster_version = "1.31"
configure_kubectl = "aws eks update-kubeconfig --region us-east-1 --name my-eks-cluster"
node_security_group_id = "sg-0abc123def456789"
oidc_provider_arn = "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/ABCDEF1234567890"

Step 8: Configure kubectl and Verify the Cluster

Update your local kubeconfig to connect to the new EKS cluster. Use the command from the Terraform output:

$ aws eks update-kubeconfig --region us-east-1 --name my-eks-cluster

Expected output:

Added new context arn:aws:eks:us-east-1:123456789012:cluster/my-eks-cluster to /home/user/.kube/config

Verify you can reach the cluster API and that nodes are ready:

$ kubectl get nodes

You should see your worker nodes in Ready state:

NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-1-45.ec2.internal                  Ready    <none>   3m    v1.31.2-eks-7f9249a
ip-10-0-2-78.ec2.internal                  Ready    <none>   3m    v1.31.2-eks-7f9249a

Check that all EKS add-ons are running in the kube-system namespace:

$ kubectl get pods -n kube-system

You should see pods for CoreDNS, kube-proxy, VPC CNI (aws-node), and EKS Pod Identity Agent all in Running state:

NAME                       READY   STATUS    RESTARTS   AGE
aws-node-abcde             2/2     Running   0          5m
aws-node-fghij             2/2     Running   0          5m
coredns-5678abcde-k1l2m    1/1     Running   0          8m
coredns-5678abcde-n3o4p    1/1     Running   0          8m
eks-pod-identity-agent-x1  1/1     Running   0          5m
eks-pod-identity-agent-y2  1/1     Running   0          5m
kube-proxy-q5r6s           1/1     Running   0          5m
kube-proxy-t7u8v           1/1     Running   0          5m

Verify the cluster info:

$ kubectl cluster-info

This confirms that the Kubernetes control plane and CoreDNS are running and reachable.

Step 9: Set Up the Cluster Autoscaler

The Cluster Autoscaler automatically adjusts the number of nodes in your cluster based on pod scheduling demand. It scales up when pods are pending due to insufficient resources and scales down when nodes are underused. We already tagged the node group with the required autoscaler tags in Step 5.

First, create an IAM policy for the Cluster Autoscaler. Add this to your main.tf file:

# ------------------------------------------------------------------
# Cluster Autoscaler IAM
# ------------------------------------------------------------------
module "cluster_autoscaler_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.48"

  role_name                        = "${var.cluster_name}-cluster-autoscaler"
  attach_cluster_autoscaler_policy = true
  cluster_autoscaler_cluster_names = [module.eks.cluster_name]

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:cluster-autoscaler"]
    }
  }

  tags = {
    Environment = var.environment
  }
}

Add an output for the autoscaler role ARN in outputs.tf:

output "cluster_autoscaler_role_arn" {
  description = "IAM role ARN for the Cluster Autoscaler"
  value       = module.cluster_autoscaler_irsa.iam_role_arn
}

Apply the changes to create the IAM role:

$ terraform apply -auto-approve

Now deploy the Cluster Autoscaler using kubectl. Create a manifest file:

$ vim cluster-autoscaler.yaml

Add the following Kubernetes manifest (replace AUTOSCALER_ROLE_ARN with the ARN from Terraform output and MY_CLUSTER_NAME with your cluster name):

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  annotations:
    eks.amazonaws.com/role-arn: AUTOSCALER_ROLE_ARN
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-autoscaler
rules:
  - apiGroups: [""]
    resources: ["events", "endpoints"]
    verbs: ["create", "patch"]
  - apiGroups: [""]
    resources: ["pods/eviction"]
    verbs: ["create"]
  - apiGroups: [""]
    resources: ["pods/status"]
    verbs: ["update"]
  - apiGroups: [""]
    resources: ["endpoints"]
    resourceNames: ["cluster-autoscaler"]
    verbs: ["get", "update"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["watch", "list", "get", "update"]
  - apiGroups: [""]
    resources: ["namespaces", "pods", "services", "replicationcontrollers", "persistentvolumeclaims", "persistentvolumes"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["extensions"]
    resources: ["replicasets", "daemonsets"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["policy"]
    resources: ["poddisruptionbudgets"]
    verbs: ["watch", "list"]
  - apiGroups: ["apps"]
    resources: ["statefulsets", "replicasets", "daemonsets"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses", "csinodes", "csidrivers", "csistoragecapacities"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["batch", "extensions"]
    resources: ["jobs"]
    verbs: ["get", "list", "watch", "patch"]
  - apiGroups: ["coordination.k8s.io"]
    resources: ["leases"]
    verbs: ["create"]
  - apiGroups: ["coordination.k8s.io"]
    resourceNames: ["cluster-autoscaler"]
    resources: ["leases"]
    verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-autoscaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-autoscaler
subjects:
  - kind: ServiceAccount
    name: cluster-autoscaler
    namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      priorityClassName: system-cluster-critical
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
        fsGroup: 65534
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: cluster-autoscaler
          image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.31.0
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/MY_CLUSTER_NAME
            - --balance-similar-node-groups
            - --skip-nodes-with-system-pods=false
          resources:
            limits:
              cpu: 100m
              memory: 600Mi
            requests:
              cpu: 100m
              memory: 600Mi

Apply the autoscaler manifest:

$ kubectl apply -f cluster-autoscaler.yaml

Verify the autoscaler pod is running:

$ kubectl get pods -n kube-system -l app=cluster-autoscaler
NAME                                  READY   STATUS    RESTARTS   AGE
cluster-autoscaler-6b4f5c8d9f-xk2mn  1/1     Running   0          30s

Check the logs to confirm it discovered your Auto Scaling Group:

$ kubectl logs -n kube-system -l app=cluster-autoscaler --tail=20

You should see log lines showing the autoscaler found your node group ASG and is monitoring it for scaling events.

Step 10: Deploy a Test Application

Deploy a sample nginx application to verify the cluster is fully operational with networking, DNS resolution, and load balancing.

$ vim test-app.yaml

Add the following deployment and service definition:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-test
  namespace: default
  labels:
    app: nginx-test
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx-test
  template:
    metadata:
      labels:
        app: nginx-test
    spec:
      containers:
        - name: nginx
          image: nginx:stable-alpine
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              cpu: 100m
              memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-test
  namespace: default
spec:
  type: LoadBalancer
  selector:
    app: nginx-test
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

Apply the test application:

$ kubectl apply -f test-app.yaml

Wait for the pods to start and the load balancer to provision:

$ kubectl get pods -l app=nginx-test
NAME                          READY   STATUS    RESTARTS   AGE
nginx-test-7d5b8f6c9f-abc12  1/1     Running   0          45s
nginx-test-7d5b8f6c9f-def34  1/1     Running   0          45s
nginx-test-7d5b8f6c9f-ghi56  1/1     Running   0          45s

Check the service to get the load balancer URL:

$ kubectl get svc nginx-test
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP                                                             PORT(S)        AGE
nginx-test   LoadBalancer   172.20.45.123   a1b2c3d4e5f6-1234567890.us-east-1.elb.amazonaws.com                    80:31234/TCP   2m

The EXTERNAL-IP column shows the AWS Classic Load Balancer DNS name. It can take 2-3 minutes for the load balancer to become active. Test it with curl:

$ curl -s http://a1b2c3d4e5f6-1234567890.us-east-1.elb.amazonaws.com | head -5

You should see the default nginx welcome page HTML, confirming that networking, DNS, and load balancing are all working end to end.

Clean up the test application once verified:

$ kubectl delete -f test-app.yaml

Understanding the IAM Roles Created by the EKS Module

The EKS Terraform module creates several IAM roles automatically. It helps to understand what each one does:

  • Cluster IAM Role – Allows the EKS service to manage AWS resources on your behalf. It has the AmazonEKSClusterPolicy attached.
  • Node Group IAM Role – Assigned to EC2 instances in the managed node group. It has AmazonEKSWorkerNodePolicy, AmazonEKS_CNI_Policy, and AmazonEC2ContainerRegistryReadOnly attached.
  • OIDC Provider – An OpenID Connect identity provider that enables Kubernetes service accounts to assume IAM roles (IRSA). This is how the Cluster Autoscaler gets its AWS permissions without using access keys.

You can view the created roles in the AWS IAM console or CLI:

$ aws iam list-roles --query "Roles[?contains(RoleName, 'my-eks-cluster')].[RoleName,Arn]" --output table

Working with EKS Add-ons

The three core EKS add-ons we configured in Step 5 are essential for cluster operation:

CoreDNS provides DNS resolution inside the cluster. Every pod uses CoreDNS to resolve service names like my-service.default.svc.cluster.local. The EKS managed add-on keeps CoreDNS updated and compatible with your cluster version.

kube-proxy maintains network rules on each node that enable Service-based networking. It handles routing traffic from a Service’s ClusterIP to the backing pods.

VPC CNI (aws-node) is the networking plugin that assigns real VPC IP addresses to pods. With prefix delegation enabled, each node can support many more pods because it assigns /28 CIDR blocks instead of individual IPs.

Check the installed add-on versions at any time:

$ aws eks describe-addon --cluster-name my-eks-cluster --addon-name vpc-cni --query "addon.addonVersion" --output text
$ aws eks describe-addon --cluster-name my-eks-cluster --addon-name coredns --query "addon.addonVersion" --output text
$ aws eks describe-addon --cluster-name my-eks-cluster --addon-name kube-proxy --query "addon.addonVersion" --output text

To update add-ons, simply change the Terraform configuration (or keep most_recent = true) and run terraform apply.

Adding a Spot Instance Node Group (Optional)

For cost savings on non-critical workloads, you can add a Spot instance node group alongside the On-Demand group. Add this block inside the eks_managed_node_groups map in main.tf:

    spot = {
      name            = "${var.cluster_name}-spot-ng"
      instance_types  = ["t3.medium", "t3.large", "t3a.medium", "t3a.large"]
      capacity_type   = "SPOT"

      min_size     = 0
      max_size     = 10
      desired_size = 2
      disk_size    = 50

      ami_type = "AL2023_x86_64_STANDARD"

      labels = {
        Environment  = var.environment
        NodeGroup    = "spot"
        CapacityType = "spot"
      }

      taints = [{
        key    = "spot"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]

      tags = {
        "k8s.io/cluster-autoscaler/enabled"             = "true"
        "k8s.io/cluster-autoscaler/${var.cluster_name}"  = "owned"
      }
    }

The taint on the Spot node group prevents regular pods from being scheduled there unless they have a matching toleration. This way, only workloads that explicitly opt in to Spot instances will run on cheaper, interruptible nodes.

Enabling Cluster Logging

EKS can send control plane logs to CloudWatch Logs. This is useful for debugging authentication issues, API audit trails, and scheduler decisions. Add this parameter to the module "eks" block:

  cluster_enabled_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]

After applying, logs will appear in CloudWatch under the log group /aws/eks/my-eks-cluster/cluster. Be aware that control plane logging adds cost, so enable only the log types you need in production.

Using a Remote Backend for Terraform State

For team environments, store Terraform state in a remote backend instead of the local filesystem. S3 with DynamoDB locking is the most common approach for AWS. Add this inside the terraform {} block in providers.tf:

  backend "s3" {
    bucket         = "my-terraform-state-bucket"
    key            = "eks/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }

Create the S3 bucket and DynamoDB table before initializing Terraform with this backend. The DynamoDB table prevents concurrent state modifications that could corrupt your infrastructure state.

Cleanup – Destroy the EKS Cluster

When you no longer need the cluster, destroy all resources to stop incurring AWS charges. First, make sure you have deleted any Kubernetes resources that created AWS infrastructure (like LoadBalancer services or EBS-backed PersistentVolumes), as Terraform does not know about those.

$ kubectl delete svc --all-namespaces -l type=LoadBalancer
$ kubectl delete pvc --all

Wait a minute for AWS to delete the associated load balancers and EBS volumes, then run:

$ terraform destroy

Terraform will show the list of resources to be destroyed. Type yes to confirm. The destroy process takes about 10-15 minutes. After completion, verify in the AWS console that the VPC, EKS cluster, and all associated resources have been removed.

$ aws eks list-clusters --region us-east-1
{
    "clusters": []
}

If the destroy gets stuck (usually on ENI or security group deletion), wait a few minutes and retry. EKS sometimes takes time to fully release network interfaces.

Troubleshooting Common Issues

Here are the most frequent issues when deploying EKS with Terraform and how to resolve them:

Nodes not joining the cluster – Check that the node group IAM role has the required policies attached. Run aws eks describe-nodegroup to check the node group status and any health issues.

kubectl connection refused – Make sure your kubeconfig is updated with aws eks update-kubeconfig and that the IAM user running kubectl is the same one that created the cluster (or has been granted access).

Pods stuck in Pending – Check if there are enough nodes and resources. Run kubectl describe pod <pod-name> to see scheduling events. This is where the Cluster Autoscaler helps by adding nodes automatically.

Terraform destroy fails on VPC – This usually means there are still ENIs or load balancers attached to the VPC subnets. Delete any remaining Kubernetes services of type LoadBalancer, wait for the ELBs to be removed, then retry the destroy.

CoreDNS pods in CrashLoopBackOff – This often happens when the VPC CNI is not ready yet. The before_compute = true setting on the VPC CNI add-on ensures it is installed before nodes join, which prevents this race condition.

Conclusion

You now have a fully functional Amazon EKS cluster deployed and managed through Terraform. The setup includes a multi-AZ VPC with proper subnet tagging, managed node groups running Amazon Linux 2023, EKS-managed add-ons for networking and DNS, IAM roles with least-privilege access through OIDC/IRSA, and the Cluster Autoscaler for dynamic node scaling.

For production deployments, consider enabling cluster logging, switching to multiple NAT gateways for high availability, adding node group encryption with KMS, implementing network policies with Calico, and setting up monitoring with Prometheus and Grafana. Store your Terraform state in S3 with DynamoDB locking, and keep all configuration in version control.

Related Guides

LEAVE A REPLY

Please enter your comment!
Please enter your name here