Running workloads on Amazon EKS without centralized logging is flying blind. When a pod crashes at 3 AM or an API request fails intermittently, you need logs aggregated in one place – not scattered across dozens of worker nodes. CloudWatch gives you that single pane of glass for EKS control plane logs, application logs, and node-level system logs.
This guide covers how to enable CloudWatch logging for your EKS cluster end to end. We start with control plane logging (API server, audit, scheduler), then set up Fluent Bit as a DaemonSet to ship pod and node logs to CloudWatch. You will also learn how to query logs with CloudWatch Insights, set up alarms, and keep costs under control with retention policies and log filters.
Prerequisites
Before you begin, make sure the following are in place:
- A running EKS cluster (Kubernetes 1.28+)
- AWS CLI v2 installed and configured with credentials that have EKS and CloudWatch permissions
kubectlconfigured to talk to your cluster (aws eks update-kubeconfig)eksctlinstalled (optional but simplifies control plane logging setup)- IAM permissions: the node instance role (or IRSA service account) needs
CloudWatchAgentServerPolicyattached
Step 1: Enable EKS Control Plane Logging
EKS control plane logging sends logs from the Kubernetes API server, audit system, authenticator, controller manager, and scheduler directly to CloudWatch. These logs are critical for debugging authentication failures, tracking API calls, and understanding scheduling decisions. AWS manages the control plane – you just need to flip the switch.
Five control plane log types
| Log Type | What It Captures |
|---|---|
api | Kubernetes API server requests and responses |
audit | Who did what – every API call with user identity, timestamp, and resource |
authenticator | IAM-to-Kubernetes RBAC authentication (unique to EKS) |
controllerManager | Core control loops – replica scaling, node lifecycle, endpoints |
scheduler | Pod placement decisions – why a pod landed on a specific node |
Enable with eksctl
The fastest way to enable all five log types is with eksctl. Replace my-cluster and us-east-1 with your cluster name and region:
eksctl utils update-cluster-logging \
--cluster my-cluster \
--region us-east-1 \
--enable-types all \
--approve
If you only want audit and authenticator logs (common for security-focused setups), specify them explicitly:
eksctl utils update-cluster-logging \
--cluster my-cluster \
--region us-east-1 \
--enable-types audit,authenticator \
--approve
Enable with AWS CLI
If you prefer the AWS CLI or need this in a CI/CD pipeline, use aws eks update-cluster-config:
aws eks update-cluster-config \
--region us-east-1 \
--name my-cluster \
--logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'
The update takes a few minutes. Monitor its progress with the update ID returned by the previous command:
aws eks describe-update \
--region us-east-1 \
--name my-cluster \
--update-id UPDATE_ID
Once the status shows Successful, EKS sends control plane logs to a CloudWatch log group named /aws/eks/my-cluster/cluster. Each log type gets its own log stream prefix inside that group.
Verify control plane logging is active
Confirm the logging configuration with:
aws eks describe-cluster \
--name my-cluster \
--region us-east-1 \
--query "cluster.logging.clusterLogging"
You should see all five types listed with enabled: true:
[
{
"types": [
"api",
"audit",
"authenticator",
"controllerManager",
"scheduler"
],
"enabled": true
}
]
Step 2: Install Fluent Bit DaemonSet for Pod Logs
Control plane logging only covers the managed Kubernetes components. To collect logs from your actual application pods, node system logs, and dataplane services (kubelet, kube-proxy), you need a log forwarder running on every node. AWS recommends Fluent Bit as the default log agent for EKS – it uses significantly less memory and CPU than Fluentd.
Fluent Bit runs as a DaemonSet, meaning one pod per node. It reads container log files from /var/log/containers, parses them, and ships them to CloudWatch log groups organized by type.
Create the amazon-cloudwatch namespace
All CloudWatch logging components run in a dedicated namespace. Create it first:
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cloudwatch-namespace.yaml
Verify the namespace was created:
kubectl get namespace amazon-cloudwatch
The namespace should show Active status:
NAME STATUS AGE
amazon-cloudwatch Active 5s
Create the Fluent Bit ConfigMap
Fluent Bit needs to know your cluster name and region. Create a ConfigMap with these details – replace my-cluster and us-east-1 with your values:
kubectl create configmap fluent-bit-cluster-info \
--from-literal=cluster.name=my-cluster \
--from-literal=http.server=On \
--from-literal=http.port=2020 \
--from-literal=read.head=Off \
--from-literal=read.tail=On \
--from-literal=logs.region=us-east-1 \
-n amazon-cloudwatch
The read.tail=On setting tells Fluent Bit to only collect new logs going forward. Set read.head=On and read.tail=Off if you need historical logs from before Fluent Bit was deployed.
Deploy the Fluent Bit DaemonSet
Apply the AWS-maintained Fluent Bit manifest that includes the DaemonSet, ServiceAccount, ClusterRole, and ClusterRoleBinding:
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/fluent-bit/fluent-bit.yaml
Wait about 30 seconds and then verify Fluent Bit pods are running on every node:
kubectl get pods -n amazon-cloudwatch -l k8s-app=fluent-bit
Each worker node should have one Fluent Bit pod in Running state:
NAME READY STATUS RESTARTS AGE
fluent-bit-7k2xq 1/1 Running 0 45s
fluent-bit-bn4rm 1/1 Running 0 45s
fluent-bit-zp9kv 1/1 Running 0 45s
Step 3: Configure Fluent Bit Log Streams
The default Fluent Bit deployment creates three separate log groups in CloudWatch, each capturing a different layer of your cluster. Understanding this structure helps you find the right logs quickly during an incident.
| Log Group | Source | What You Find Here |
|---|---|---|
/aws/containerinsights/CLUSTER/application | /var/log/containers/*.log | stdout/stderr from all pods – your app logs, crash stack traces, health check failures |
/aws/containerinsights/CLUSTER/dataplane | /var/log/journal | kubelet logs, kube-proxy logs, container runtime logs |
/aws/containerinsights/CLUSTER/host | /var/log/dmesg, /var/log/secure, /var/log/messages | Node-level OS logs – kernel messages, SSH access, system events |
If you need to customize which logs Fluent Bit collects (for example, excluding noisy health check logs), edit the ConfigMap directly:
kubectl edit configmap fluent-bit-config -n amazon-cloudwatch
The ConfigMap contains INPUT, FILTER, and OUTPUT sections. A common customization is adding a filter to drop logs from specific namespaces. Add this under the [FILTER] sections to exclude kube-system health check noise:
[FILTER]
Name grep
Match application.*
Exclude log healthcheck
After editing the ConfigMap, restart the DaemonSet to pick up the changes:
kubectl rollout restart daemonset fluent-bit -n amazon-cloudwatch
Step 4: View Logs in CloudWatch Log Insights
CloudWatch Log Insights is where you actually search and analyze your EKS logs. It supports a purpose-built query language that handles structured JSON logs well – which is exactly what Kubernetes outputs.
Open the CloudWatch console, go to Logs > Log Insights, and select your log group. For application pod logs, choose /aws/containerinsights/my-cluster/application.
You can also query from the CLI. This command searches application logs from the last hour:
aws logs start-query \
--log-group-name "/aws/containerinsights/my-cluster/application" \
--start-time $(date -d '1 hour ago' +%s) \
--end-time $(date +%s) \
--query-string 'fields @timestamp, kubernetes.pod_name, log | sort @timestamp desc | limit 50'
This returns a query ID. Retrieve the results with:
aws logs get-query-results --query-id QUERY_ID
For a quick check that logs are flowing, list the log streams in your application log group:
aws logs describe-log-streams \
--log-group-name "/aws/containerinsights/my-cluster/application" \
--order-by LastEventTime \
--descending \
--limit 5
If you see recent log streams with lastEventTimestamp values from the last few minutes, Fluent Bit is working correctly.
Step 5: Create Log Groups and Set Retention Policies
Fluent Bit auto-creates log groups when it first ships logs. However, auto-created groups default to never expire – which means your CloudWatch costs grow indefinitely. Set retention policies right away.
If you want to pre-create log groups with retention set from the start (before Fluent Bit runs), create them manually:
aws logs create-log-group --log-group-name "/aws/containerinsights/my-cluster/application"
aws logs create-log-group --log-group-name "/aws/containerinsights/my-cluster/dataplane"
aws logs create-log-group --log-group-name "/aws/containerinsights/my-cluster/host"
Set retention to 30 days for application and dataplane logs, and 14 days for host logs:
aws logs put-retention-policy \
--log-group-name "/aws/containerinsights/my-cluster/application" \
--retention-in-days 30
aws logs put-retention-policy \
--log-group-name "/aws/containerinsights/my-cluster/dataplane" \
--retention-in-days 30
aws logs put-retention-policy \
--log-group-name "/aws/containerinsights/my-cluster/host" \
--retention-in-days 14
For the control plane log group, 90 days is a reasonable default since audit logs are often needed for compliance reviews:
aws logs put-retention-policy \
--log-group-name "/aws/eks/my-cluster/cluster" \
--retention-in-days 90
Verify the retention policies are set correctly:
aws logs describe-log-groups \
--log-group-name-prefix "/aws/containerinsights/my-cluster" \
--query "logGroups[*].[logGroupName,retentionInDays]" \
--output table
The output shows each log group with its retention setting:
----------------------------------------------------------------------
| DescribeLogGroups |
+------------------------------------------------------------+------+
| /aws/containerinsights/my-cluster/application | 30 |
| /aws/containerinsights/my-cluster/dataplane | 30 |
| /aws/containerinsights/my-cluster/host | 14 |
+------------------------------------------------------------+------+
Step 6: CloudWatch Insights Query Examples for EKS
Here are practical queries you will use regularly when debugging EKS workloads. Run these in the CloudWatch Log Insights console or via the CLI.
Find error logs from a specific pod
When a specific pod is misbehaving, filter by pod name and look for errors:
fields @timestamp, log
| filter kubernetes.pod_name like /my-app/
| filter log like /error|Error|ERROR|exception|Exception/
| sort @timestamp desc
| limit 100
Count log events by namespace
Identify which namespaces generate the most log volume – useful for finding noisy services that inflate costs:
stats count(*) as logCount by kubernetes.namespace_name
| sort logCount desc
Track pod restarts and OOMKilled events
Search dataplane logs for kubelet events showing containers being killed or restarted:
fields @timestamp, @message
| filter @message like /OOMKilled|CrashLoopBackOff|BackOff/
| sort @timestamp desc
| limit 50
Audit who deleted a resource
When something disappears from your cluster, check the control plane audit logs. This query searches for delete operations on deployments:
fields @timestamp, user.username, objectRef.name, objectRef.namespace, verb
| filter verb = "delete"
| filter objectRef.resource = "deployments"
| sort @timestamp desc
| limit 20
Find failed API authentication attempts
Search authenticator logs for denied requests – critical for security monitoring:
fields @timestamp, @message
| filter @logStream like /authenticator/
| filter @message like /Unauthorized|Forbidden|denied/
| sort @timestamp desc
| limit 50
Step 7: Set Up CloudWatch Alarms from EKS Logs
Alarms turn your logs into actionable alerts. Instead of watching dashboards, you get notified when something goes wrong. The pattern is: create a metric filter that matches a log pattern, then attach an alarm to that metric.
Create a metric filter for application errors
This metric filter counts the number of ERROR-level log entries across all application pods:
aws logs put-metric-filter \
--log-group-name "/aws/containerinsights/my-cluster/application" \
--filter-name "EKSAppErrors" \
--filter-pattern "ERROR" \
--metric-transformations \
metricName=ApplicationErrorCount,metricNamespace=EKS/Logs,metricValue=1,defaultValue=0
Create an alarm on the error metric
This alarm fires when more than 50 errors occur within 5 minutes. Replace the SNS topic ARN with your own notification target:
aws cloudwatch put-metric-alarm \
--alarm-name "EKS-High-Error-Rate" \
--metric-name ApplicationErrorCount \
--namespace EKS/Logs \
--statistic Sum \
--period 300 \
--threshold 50 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:eks-alerts \
--treat-missing-data notBreaching
Alert on OOMKilled events
OOMKilled containers often indicate memory limits are too low. Create a metric filter and alarm to catch these early:
aws logs put-metric-filter \
--log-group-name "/aws/containerinsights/my-cluster/dataplane" \
--filter-name "OOMKilledEvents" \
--filter-pattern "OOMKilled" \
--metric-transformations \
metricName=OOMKilledCount,metricNamespace=EKS/Logs,metricValue=1,defaultValue=0
aws cloudwatch put-metric-alarm \
--alarm-name "EKS-OOMKilled-Alert" \
--metric-name OOMKilledCount \
--namespace EKS/Logs \
--statistic Sum \
--period 300 \
--threshold 3 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:eks-alerts \
--treat-missing-data notBreaching
To see existing alarms for your EKS logging metrics:
aws cloudwatch describe-alarms \
--alarm-name-prefix "EKS-" \
--query "MetricAlarms[*].[AlarmName,StateValue]" \
--output table
If you have Prometheus deployed on your EKS cluster, you can use CloudWatch alarms alongside Prometheus alerting rules for defense in depth – CloudWatch catches infrastructure-level issues while Prometheus handles application-specific metrics.
Step 8: Fluent Bit vs Fluentd – Which to Use on EKS
AWS officially recommends Fluent Bit over Fluentd for EKS logging. Here is a direct comparison to help you decide if you are migrating from an existing Fluentd setup.
| Feature | Fluent Bit | Fluentd |
|---|---|---|
| Language | C | Ruby + C |
| Memory usage | ~20-30 MB per pod | ~100-200 MB per pod |
| CPU usage | Lower – efficient for high-throughput | Higher – Ruby GC overhead |
| Plugin ecosystem | Smaller but covers core use cases | Larger – 700+ community plugins |
| AWS support | Default for Container Insights | Legacy – still works but not recommended |
| Startup time | Sub-second | Several seconds |
| Configuration | INI-style (simple) | XML-style (more complex) |
| Multi-line parsing | Built-in | Plugin required |
Use Fluent Bit for new deployments – it handles everything most EKS clusters need. The only reason to stick with Fluentd is if you rely on specific community plugins that Fluent Bit does not support yet. If you want to explore forwarding logs to Elasticsearch instead, see our guide on forwarding Kubernetes logs to Elasticsearch using Fluent Bit.
If you are currently running Fluentd, uninstall it before deploying Fluent Bit to avoid duplicate log processing:
kubectl delete daemonset fluentd-cloudwatch -n amazon-cloudwatch
Step 9: Cost Optimization for CloudWatch EKS Logging
CloudWatch logging costs come from three dimensions: ingestion ($0.50/GB), storage ($0.03/GB/month), and queries ($0.005/GB scanned). On a busy cluster with dozens of microservices, this adds up fast. Here are concrete ways to reduce costs without losing visibility.
Set aggressive retention policies
Most teams never look at logs older than 30 days. Set retention accordingly – you already configured this in Step 5. For production clusters, 30 days for application logs and 90 days for audit logs covers most incident investigations and compliance requirements.
Filter out noisy logs at the source
Health check endpoints and readiness probes can generate thousands of log lines per minute. Filter them in the Fluent Bit ConfigMap before they reach CloudWatch:
[FILTER]
Name grep
Match application.*
Exclude log /health
[FILTER]
Name grep
Match application.*
Exclude log /ready
Disable control plane log types you do not need
Not every cluster needs all five control plane log types. A reasonable minimum is audit and authenticator for security. The scheduler and controller manager logs are rarely needed outside troubleshooting sessions. Disable the rest:
eksctl utils update-cluster-logging \
--cluster my-cluster \
--region us-east-1 \
--enable-types audit,authenticator \
--disable-types api,controllerManager,scheduler \
--approve
Use log level filtering in your applications
The cheapest log line is the one that never gets written. Configure your applications to log at WARN or ERROR level in production instead of DEBUG or INFO. This single change often reduces log volume by 80% or more.
Monitor log ingestion costs
Set up a CloudWatch alarm on the IncomingBytes metric to catch unexpected log volume spikes before they become expensive surprises:
aws cloudwatch put-metric-alarm \
--alarm-name "CloudWatch-Log-Ingestion-High" \
--namespace "AWS/Logs" \
--metric-name IncomingBytes \
--dimensions Name=LogGroupName,Value=/aws/containerinsights/my-cluster/application \
--statistic Sum \
--period 3600 \
--threshold 1073741824 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:eks-alerts \
--treat-missing-data notBreaching
That threshold of 1073741824 bytes (1 GB) per hour triggers the alarm. Adjust based on your normal ingestion rate. You can check your current rate with CloudWatch Container Insights dashboards or the Kubernetes Metrics Server for resource-level monitoring.
Conclusion
You now have a complete CloudWatch logging pipeline for your EKS cluster – control plane logs for cluster operations visibility, Fluent Bit shipping application and node logs, Insights queries for fast debugging, and alarms for proactive alerting. The combination covers every layer from the Kubernetes API server down to individual container output.
For production hardening, enable encryption on your CloudWatch log groups with a KMS key, export critical logs to S3 for long-term archival, and integrate CloudWatch alarms with your incident management system (PagerDuty, OpsGenie, or Slack). Review your log retention and filter settings quarterly to keep costs aligned with actual usage.