How To

Install CloudWatch Container Insights on EKS

CloudWatch Container Insights collects container-level metrics and logs from your Amazon EKS clusters. It gives you visibility into CPU usage, memory consumption, network traffic, and pod health across every node and workload – all within the CloudWatch console. For production EKS clusters, this is essential for troubleshooting performance issues and setting up automated alerting.

Original content from computingforgeeks.com - post 68399

This guide walks through the full setup of CloudWatch Container Insights on Amazon EKS. We cover IAM permissions, the CloudWatch Observability add-on deployment, Fluent Bit log collection, dashboard creation, alarms, cost optimization, and Fargate support.

Prerequisites

Before starting, confirm you have the following in place:

  • A running Amazon EKS cluster (version 1.27 or later recommended)
  • AWS CLI v2 installed and configured with credentials that have IAM and EKS admin permissions
  • kubectl configured to communicate with your EKS cluster
  • eksctl installed (used for OIDC provider and IAM service account setup)
  • Sufficient IAM permissions to create policies, roles, and EKS add-ons

Verify that kubectl can reach your cluster before proceeding.

kubectl get nodes

You should see your worker nodes listed with a Ready status:

NAME                                           STATUS   ROLES    AGE   VERSION
ip-10-0-1-45.eu-west-1.compute.internal        Ready    <none>   5d    v1.30.2-eks-1552ad0
ip-10-0-2-112.eu-west-1.compute.internal       Ready    <none>   5d    v1.30.2-eks-1552ad0

Step 1: Create IAM Policy for CloudWatch Agent

The CloudWatch agent running inside your EKS cluster needs permission to push metrics and logs to CloudWatch. AWS provides a managed policy called CloudWatchAgentServerPolicy that covers exactly this. There are two approaches – attaching the policy directly to the worker node IAM role, or using IAM Roles for Service Accounts (IRSA) for tighter scoping.

For the IRSA approach (recommended), first ensure your cluster has an OpenID Connect (OIDC) provider associated with it.

eksctl utils associate-iam-oidc-provider --cluster your-cluster-name --approve

The command outputs the OIDC provider ARN if it was created or confirms it already exists:

2026-03-22 10:15:32 [ℹ]  will create IAM Open ID Connect provider for cluster "your-cluster-name" in "eu-west-1"
2026-03-22 10:15:33 [✔]  created IAM Open ID Connect provider for cluster "your-cluster-name" in "eu-west-1"

Next, create the IAM service account that the CloudWatch agent will use. This binds the CloudWatchAgentServerPolicy managed policy to a Kubernetes service account in the amazon-cloudwatch namespace.

eksctl create iamserviceaccount \
  --name cloudwatch-agent \
  --namespace amazon-cloudwatch \
  --cluster your-cluster-name \
  --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
  --approve \
  --override-existing-serviceaccounts

Verify the service account was created and has the correct IAM role annotation:

kubectl get serviceaccount cloudwatch-agent -n amazon-cloudwatch -o yaml

The output should show an eks.amazonaws.com/role-arn annotation pointing to the IAM role that was automatically created:

apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/eksctl-your-cluster-addon-iamserviceaccount-Role1-XXXX
  name: cloudwatch-agent
  namespace: amazon-cloudwatch

Step 2: Attach IAM Role to Node Group (Alternative Method)

If you prefer the simpler node-level approach instead of IRSA, attach the CloudWatchAgentServerPolicy directly to the IAM role used by your EKS managed node group. This gives all pods on the node CloudWatch permissions, so IRSA is preferred for production environments.

First, find the IAM role ARN used by your node group.

aws eks describe-nodegroup \
  --cluster-name your-cluster-name \
  --nodegroup-name your-nodegroup-name \
  --query "nodegroup.nodeRole" \
  --output text

This returns the full ARN of the node group IAM role:

arn:aws:iam::123456789012:role/eksctl-your-cluster-nodegroup-NodeInstanceRole-XXXXX

Attach the CloudWatch managed policy to that role.

aws iam attach-role-policy \
  --role-name eksctl-your-cluster-nodegroup-NodeInstanceRole-XXXXX \
  --policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy

Confirm the policy is attached by listing the role’s policies:

aws iam list-attached-role-policies \
  --role-name eksctl-your-cluster-nodegroup-NodeInstanceRole-XXXXX

You should see CloudWatchAgentServerPolicy in the attached policies list:

{
    "AttachedPolicies": [
        {
            "PolicyName": "AmazonEKSWorkerNodePolicy",
            "PolicyArn": "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
        },
        {
            "PolicyName": "CloudWatchAgentServerPolicy",
            "PolicyArn": "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
        }
    ]
}

Step 3: Deploy CloudWatch Agent Using the Observability Add-on

AWS now recommends the Amazon CloudWatch Observability EKS add-on as the primary method to deploy Container Insights. This add-on installs both the CloudWatch agent (for metrics) and Fluent Bit (for logs) as a single managed package.

Install the add-on using the AWS CLI.

aws eks create-addon \
  --addon-name amazon-cloudwatch-observability \
  --cluster-name your-cluster-name

The add-on creation takes a minute or two. Check its status with:

aws eks describe-addon \
  --addon-name amazon-cloudwatch-observability \
  --cluster-name your-cluster-name \
  --query "addon.status" \
  --output text

Once ready, the status shows ACTIVE:

ACTIVE

Verify that the CloudWatch agent pods are running on each node as a DaemonSet.

kubectl get pods -n amazon-cloudwatch

You should see one cloudwatch-agent pod per worker node, all in Running state:

NAME                                  READY   STATUS    RESTARTS   AGE
amazon-cloudwatch-agent-hk7tm         1/1     Running   0          3m
amazon-cloudwatch-agent-rn4jx         1/1     Running   0          3m
fluent-bit-9vqpf                      1/1     Running   0          3m
fluent-bit-ld2wk                      1/1     Running   0          3m

If you used IRSA in Step 1, pass the service account configuration when creating the add-on:

aws eks create-addon \
  --addon-name amazon-cloudwatch-observability \
  --cluster-name your-cluster-name \
  --service-account-role-arn arn:aws:iam::123456789012:role/eksctl-your-cluster-addon-iamserviceaccount-Role1-XXXX

Step 4: Deploy Fluent Bit for Log Collection

The CloudWatch Observability add-on from Step 3 automatically deploys Fluent Bit alongside the CloudWatch agent. Fluent Bit is a lightweight log processor that ships container logs, application logs, and host logs to CloudWatch Logs.

If you installed via the add-on, Fluent Bit is already running. Verify it:

kubectl get daemonset -n amazon-cloudwatch

Both DaemonSets should show the desired count matching your node count:

NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
amazon-cloudwatch-agent     2         2         2       2            2           <none>          5m
fluent-bit                  2         2         2       2            2           <none>          5m

Fluent Bit creates three log groups in CloudWatch Logs for your cluster:

  • /aws/containerinsights/your-cluster-name/application – logs from all application containers
  • /aws/containerinsights/your-cluster-name/host – logs from the node’s system services
  • /aws/containerinsights/your-cluster-name/dataplane – logs from EKS data plane components (kube-proxy, aws-node)

Confirm the log groups were created:

aws logs describe-log-groups \
  --log-group-name-prefix "/aws/containerinsights/your-cluster-name" \
  --query "logGroups[].logGroupName" \
  --output table

The output lists all Container Insights log groups for your cluster:

-------------------------------------------------------------------
|                       DescribeLogGroups                         |
+-------------------------------------------------------------------+
|  /aws/containerinsights/your-cluster-name/application           |
|  /aws/containerinsights/your-cluster-name/dataplane             |
|  /aws/containerinsights/your-cluster-name/host                  |
|  /aws/containerinsights/your-cluster-name/performance           |
+-------------------------------------------------------------------+

The performance log group contains the structured metric data that powers the Container Insights dashboards.

Step 5: Verify Metrics in CloudWatch Console

After the agents have been running for a few minutes, metrics start appearing in the CloudWatch console. Open the CloudWatch console and navigate to Container Insights under the Insights section in the left sidebar.

You can also query the performance log group using CloudWatch Logs Insights to confirm data is flowing. Run this query to check recent pod CPU metrics:

aws logs start-query \
  --log-group-name "/aws/containerinsights/your-cluster-name/performance" \
  --start-time $(date -d '15 minutes ago' +%s) \
  --end-time $(date +%s) \
  --query-string 'fields @timestamp, Type, PodName, CpuUtilized | filter Type = "Pod" | sort @timestamp desc | limit 10'

This returns a query ID. Retrieve the results with:

aws logs get-query-results --query-id YOUR_QUERY_ID

The results show pod-level CPU utilization data confirming Container Insights is actively collecting metrics from your cluster.

Key metrics available through Container Insights include:

  • pod_cpu_utilization – CPU usage as a percentage of the node CPU limit
  • pod_memory_utilization – memory usage as a percentage of the node memory limit
  • node_cpu_utilization – node-level CPU usage percentage
  • node_memory_utilization – node-level memory usage percentage
  • pod_network_rx_bytes and pod_network_tx_bytes – network traffic per pod

Step 6: Container Insights Dashboards

Container Insights provides automatic dashboards in the CloudWatch console that require no manual setup. These dashboards give you a hierarchical view of your cluster – from the cluster level down to individual containers.

To access the dashboards, go to CloudWatch in the AWS Console, then select Container Insights from the left navigation under Insights. The dashboard selector at the top lets you switch between different views:

  • EKS Clusters – cluster-wide CPU, memory, and network overview
  • EKS Nodes – per-node resource usage, number of pods running, filesystem utilization
  • EKS Namespaces – resource consumption grouped by namespace
  • EKS Services – metrics aggregated by Kubernetes service
  • EKS Pods – individual pod CPU, memory, network, and restart counts

Each dashboard includes time range selectors and auto-refresh options. You can click on any resource to drill down into its specific metrics. For Prometheus-based monitoring alongside Container Insights, the two complement each other well – Container Insights for AWS-native dashboarding and Prometheus for custom application metrics.

For custom dashboards, create a new CloudWatch dashboard and add Container Insights widgets using the metric source. Select the ContainerInsights namespace and choose the dimensions you want to track (ClusterName, Namespace, PodName, NodeName).

Step 7: Set Up CloudWatch Alarms for EKS

With metrics flowing into CloudWatch, set up alarms to get notified when your EKS workloads hit resource thresholds. The most useful alarms for EKS production clusters are CPU utilization, memory utilization, and pod restart counts.

Create an SNS topic for alarm notifications first.

aws sns create-topic --name eks-container-insights-alarms

Subscribe your email to receive notifications:

aws sns subscribe \
  --topic-arn arn:aws:sns:eu-west-1:123456789012:eks-container-insights-alarms \
  --protocol email \
  --notification-endpoint [email protected]

Create an alarm that triggers when node CPU utilization exceeds 80% for 5 consecutive minutes.

aws cloudwatch put-metric-alarm \
  --alarm-name "EKS-High-Node-CPU" \
  --metric-name node_cpu_utilization \
  --namespace ContainerInsights \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --dimensions Name=ClusterName,Value=your-cluster-name \
  --alarm-actions arn:aws:sns:eu-west-1:123456789012:eks-container-insights-alarms \
  --treat-missing-data notBreaching

Set up a memory utilization alarm with the same pattern:

aws cloudwatch put-metric-alarm \
  --alarm-name "EKS-High-Node-Memory" \
  --metric-name node_memory_utilization \
  --namespace ContainerInsights \
  --statistic Average \
  --period 300 \
  --threshold 85 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --dimensions Name=ClusterName,Value=your-cluster-name \
  --alarm-actions arn:aws:sns:eu-west-1:123456789012:eks-container-insights-alarms \
  --treat-missing-data notBreaching

For pod restart monitoring, use a Logs Insights metric filter since pod restarts are tracked in the performance logs. Create a metric filter on the performance log group:

aws logs put-metric-filter \
  --log-group-name "/aws/containerinsights/your-cluster-name/performance" \
  --filter-name "PodRestartCount" \
  --filter-pattern '{ $.Type = "Pod" && $.PodRestartCount > 0 }' \
  --metric-transformations \
    metricName=PodRestartCount,metricNamespace=ContainerInsightsCustom,metricValue='$.PodRestartCount',defaultValue=0

Then create an alarm on the custom metric:

aws cloudwatch put-metric-alarm \
  --alarm-name "EKS-Pod-Restarts" \
  --metric-name PodRestartCount \
  --namespace ContainerInsightsCustom \
  --statistic Sum \
  --period 300 \
  --threshold 5 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:eu-west-1:123456789012:eks-container-insights-alarms \
  --treat-missing-data notBreaching

Verify your alarms exist and check their current state:

aws cloudwatch describe-alarms \
  --alarm-name-prefix "EKS-" \
  --query "MetricAlarms[].{Name:AlarmName,State:StateValue}" \
  --output table

All alarms should show an OK state initially:

----------------------------------------------
|              DescribeAlarms                |
+-------------------------+------------------+
|          Name           |     State        |
+-------------------------+------------------+
|  EKS-High-Node-CPU     |  OK              |
|  EKS-High-Node-Memory  |  OK              |
|  EKS-Pod-Restarts      |  OK              |
+-------------------------+------------------+

Step 8: Cost Optimization for Container Insights

Container Insights charges are based on the volume of logs ingested and metrics collected. For clusters with many pods, costs can grow quickly. Here are practical ways to keep costs under control.

Set log retention periods. By default, CloudWatch Logs retains data indefinitely. Set retention to match your actual needs – 30 days is sufficient for most troubleshooting scenarios.

for log_group in application host dataplane performance; do
  aws logs put-retention-policy \
    --log-group-name "/aws/containerinsights/your-cluster-name/$log_group" \
    --retention-in-days 30
done

Filter out noisy namespaces. Exclude high-volume, low-value namespaces like kube-system health check logs by customizing the Fluent Bit configuration. Edit the Fluent Bit ConfigMap:

kubectl edit configmap fluent-bit-config -n amazon-cloudwatch

Add an exclude filter in the [FILTER] section to drop logs from namespaces you do not need:

[FILTER]
    Name    grep
    Match   application.*
    Exclude kubernetes_namespace_name ^(kube-system)$

Use enhanced observability selectively. Container Insights with enhanced observability collects more detailed metrics but costs more. If you only need basic cluster health monitoring, the standard Container Insights tier is enough for most workloads.

Monitor your CloudWatch costs. Check the Kubernetes Metrics Server for in-cluster resource visibility without additional CloudWatch charges. Use the AWS Cost Explorer filtered to the CloudWatch service to track Container Insights spending over time.

Step 9: Container Insights with Fargate

EKS Fargate pods run on serverless infrastructure, so DaemonSets do not work – there are no nodes to schedule them on. Container Insights handles Fargate differently by using a built-in log router based on Fluent Bit that AWS manages for you.

To enable Container Insights on Fargate, create a dedicated Fargate logging ConfigMap in the aws-observability namespace. First, create the namespace:

kubectl create namespace aws-observability

Create the ConfigMap that tells the Fargate log router to send logs to CloudWatch:

cat > /tmp/fargate-logging.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-logging
  namespace: aws-observability
data:
  flb_log_cw: "true"
  output.conf: |
    [OUTPUT]
        Name cloudwatch_logs
        Match *
        region eu-west-1
        log_group_name /aws/eks/your-cluster-name/fargate
        log_stream_prefix fargate-
        auto_create_group true
  parsers.conf: |
    [PARSER]
        Name docker
        Format json
        Time_Key time
        Time_Format %Y-%m-%dT%H:%M:%S.%LZ
EOF
kubectl apply -f /tmp/fargate-logging.yaml

The Fargate pod execution role also needs CloudWatch Logs permissions. Attach the required policy:

aws iam attach-role-policy \
  --role-name your-fargate-pod-execution-role \
  --policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy

After deploying a pod to a Fargate profile, check that its logs appear in CloudWatch. Deploy a test pod:

kubectl run fargate-test --image=busybox --restart=Never \
  --overrides='{"apiVersion":"v1","spec":{"nodeSelector":{"eks.amazonaws.com/compute-type":"fargate"}}}' \
  -- sh -c "echo 'Fargate logging test' && sleep 30"

After a minute, verify the log group was created and contains the test message:

aws logs describe-log-groups \
  --log-group-name-prefix "/aws/eks/your-cluster-name/fargate" \
  --query "logGroups[].logGroupName" \
  --output text

The output confirms the Fargate log group exists:

/aws/eks/your-cluster-name/fargate

Clean up the test pod after verification:

kubectl delete pod fargate-test

For Fargate workloads, Container Insights metrics are more limited compared to EC2-based nodes. You get pod-level CPU and memory metrics but not node-level metrics since Fargate abstracts the underlying infrastructure. If you need full cluster autoscaling visibility, use EC2 node groups with the standard Container Insights DaemonSet deployment.

Conclusion

CloudWatch Container Insights is now collecting metrics and logs from your EKS cluster. The CloudWatch Observability add-on handles the heavy lifting - deploying both the CloudWatch agent and Fluent Bit as managed components. With the dashboards, alarms, and log groups configured above, you have production-grade observability for your Kubernetes workloads.

For production hardening, set up log retention policies to control costs, use IRSA instead of node-level IAM roles, and configure alarms for the metrics that matter most to your workloads. Combine Container Insights with EKS control plane logging for full audit trail coverage of API server, authenticator, and scheduler activity.

Related Articles

Containers Install and Use Meshery Service Mesh Manager on Kubernetes Cloud Create CentOS|Ubuntu|Debian VM Templates on OpenNebula Containers Automatically replace unhealthy nodes on Magnum Kubernetes using magnum-auto-healer Automation How To Install Jenkins Server on Kubernetes | OpenShift

Leave a Comment

Press ESC to close