Kubernetes

Enable CloudWatch Logging in EKS Kubernetes Cluster

Running workloads on Amazon EKS without centralized logging is flying blind. When a pod crashes at 3 AM or an API request fails intermittently, you need logs aggregated in one place – not scattered across dozens of worker nodes. CloudWatch gives you that single pane of glass for EKS control plane logs, application logs, and node-level system logs.

Original content from computingforgeeks.com - post 67173

This guide covers how to enable CloudWatch logging for your EKS cluster end to end. We start with control plane logging (API server, audit, scheduler), then set up Fluent Bit as a DaemonSet to ship pod and node logs to CloudWatch. You will also learn how to query logs with CloudWatch Insights, set up alarms, and keep costs under control with retention policies and log filters.

Prerequisites

Before you begin, make sure the following are in place:

  • A running EKS cluster (Kubernetes 1.28+)
  • AWS CLI v2 installed and configured with credentials that have EKS and CloudWatch permissions
  • kubectl configured to talk to your cluster (aws eks update-kubeconfig)
  • eksctl installed (optional but simplifies control plane logging setup)
  • IAM permissions: the node instance role (or IRSA service account) needs CloudWatchAgentServerPolicy attached

Step 1: Enable EKS Control Plane Logging

EKS control plane logging sends logs from the Kubernetes API server, audit system, authenticator, controller manager, and scheduler directly to CloudWatch. These logs are critical for debugging authentication failures, tracking API calls, and understanding scheduling decisions. AWS manages the control plane – you just need to flip the switch.

Five control plane log types

Log TypeWhat It Captures
apiKubernetes API server requests and responses
auditWho did what – every API call with user identity, timestamp, and resource
authenticatorIAM-to-Kubernetes RBAC authentication (unique to EKS)
controllerManagerCore control loops – replica scaling, node lifecycle, endpoints
schedulerPod placement decisions – why a pod landed on a specific node

Enable with eksctl

The fastest way to enable all five log types is with eksctl. Replace my-cluster and us-east-1 with your cluster name and region:

eksctl utils update-cluster-logging \
  --cluster my-cluster \
  --region us-east-1 \
  --enable-types all \
  --approve

If you only want audit and authenticator logs (common for security-focused setups), specify them explicitly:

eksctl utils update-cluster-logging \
  --cluster my-cluster \
  --region us-east-1 \
  --enable-types audit,authenticator \
  --approve

Enable with AWS CLI

If you prefer the AWS CLI or need this in a CI/CD pipeline, use aws eks update-cluster-config:

aws eks update-cluster-config \
  --region us-east-1 \
  --name my-cluster \
  --logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'

The update takes a few minutes. Monitor its progress with the update ID returned by the previous command:

aws eks describe-update \
  --region us-east-1 \
  --name my-cluster \
  --update-id UPDATE_ID

Once the status shows Successful, EKS sends control plane logs to a CloudWatch log group named /aws/eks/my-cluster/cluster. Each log type gets its own log stream prefix inside that group.

Verify control plane logging is active

Confirm the logging configuration with:

aws eks describe-cluster \
  --name my-cluster \
  --region us-east-1 \
  --query "cluster.logging.clusterLogging"

You should see all five types listed with enabled: true:

[
    {
        "types": [
            "api",
            "audit",
            "authenticator",
            "controllerManager",
            "scheduler"
        ],
        "enabled": true
    }
]

Step 2: Install Fluent Bit DaemonSet for Pod Logs

Control plane logging only covers the managed Kubernetes components. To collect logs from your actual application pods, node system logs, and dataplane services (kubelet, kube-proxy), you need a log forwarder running on every node. AWS recommends Fluent Bit as the default log agent for EKS – it uses significantly less memory and CPU than Fluentd.

Fluent Bit runs as a DaemonSet, meaning one pod per node. It reads container log files from /var/log/containers, parses them, and ships them to CloudWatch log groups organized by type.

Create the amazon-cloudwatch namespace

All CloudWatch logging components run in a dedicated namespace. Create it first:

kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cloudwatch-namespace.yaml

Verify the namespace was created:

kubectl get namespace amazon-cloudwatch

The namespace should show Active status:

NAME                STATUS   AGE
amazon-cloudwatch   Active   5s

Create the Fluent Bit ConfigMap

Fluent Bit needs to know your cluster name and region. Create a ConfigMap with these details – replace my-cluster and us-east-1 with your values:

kubectl create configmap fluent-bit-cluster-info \
  --from-literal=cluster.name=my-cluster \
  --from-literal=http.server=On \
  --from-literal=http.port=2020 \
  --from-literal=read.head=Off \
  --from-literal=read.tail=On \
  --from-literal=logs.region=us-east-1 \
  -n amazon-cloudwatch

The read.tail=On setting tells Fluent Bit to only collect new logs going forward. Set read.head=On and read.tail=Off if you need historical logs from before Fluent Bit was deployed.

Deploy the Fluent Bit DaemonSet

Apply the AWS-maintained Fluent Bit manifest that includes the DaemonSet, ServiceAccount, ClusterRole, and ClusterRoleBinding:

kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/fluent-bit/fluent-bit.yaml

Wait about 30 seconds and then verify Fluent Bit pods are running on every node:

kubectl get pods -n amazon-cloudwatch -l k8s-app=fluent-bit

Each worker node should have one Fluent Bit pod in Running state:

NAME               READY   STATUS    RESTARTS   AGE
fluent-bit-7k2xq   1/1     Running   0          45s
fluent-bit-bn4rm   1/1     Running   0          45s
fluent-bit-zp9kv   1/1     Running   0          45s

Step 3: Configure Fluent Bit Log Streams

The default Fluent Bit deployment creates three separate log groups in CloudWatch, each capturing a different layer of your cluster. Understanding this structure helps you find the right logs quickly during an incident.

Log GroupSourceWhat You Find Here
/aws/containerinsights/CLUSTER/application/var/log/containers/*.logstdout/stderr from all pods – your app logs, crash stack traces, health check failures
/aws/containerinsights/CLUSTER/dataplane/var/log/journalkubelet logs, kube-proxy logs, container runtime logs
/aws/containerinsights/CLUSTER/host/var/log/dmesg, /var/log/secure, /var/log/messagesNode-level OS logs – kernel messages, SSH access, system events

If you need to customize which logs Fluent Bit collects (for example, excluding noisy health check logs), edit the ConfigMap directly:

kubectl edit configmap fluent-bit-config -n amazon-cloudwatch

The ConfigMap contains INPUT, FILTER, and OUTPUT sections. A common customization is adding a filter to drop logs from specific namespaces. Add this under the [FILTER] sections to exclude kube-system health check noise:

[FILTER]
    Name    grep
    Match   application.*
    Exclude log healthcheck

After editing the ConfigMap, restart the DaemonSet to pick up the changes:

kubectl rollout restart daemonset fluent-bit -n amazon-cloudwatch

Step 4: View Logs in CloudWatch Log Insights

CloudWatch Log Insights is where you actually search and analyze your EKS logs. It supports a purpose-built query language that handles structured JSON logs well – which is exactly what Kubernetes outputs.

Open the CloudWatch console, go to Logs > Log Insights, and select your log group. For application pod logs, choose /aws/containerinsights/my-cluster/application.

You can also query from the CLI. This command searches application logs from the last hour:

aws logs start-query \
  --log-group-name "/aws/containerinsights/my-cluster/application" \
  --start-time $(date -d '1 hour ago' +%s) \
  --end-time $(date +%s) \
  --query-string 'fields @timestamp, kubernetes.pod_name, log | sort @timestamp desc | limit 50'

This returns a query ID. Retrieve the results with:

aws logs get-query-results --query-id QUERY_ID

For a quick check that logs are flowing, list the log streams in your application log group:

aws logs describe-log-streams \
  --log-group-name "/aws/containerinsights/my-cluster/application" \
  --order-by LastEventTime \
  --descending \
  --limit 5

If you see recent log streams with lastEventTimestamp values from the last few minutes, Fluent Bit is working correctly.

Step 5: Create Log Groups and Set Retention Policies

Fluent Bit auto-creates log groups when it first ships logs. However, auto-created groups default to never expire – which means your CloudWatch costs grow indefinitely. Set retention policies right away.

If you want to pre-create log groups with retention set from the start (before Fluent Bit runs), create them manually:

aws logs create-log-group --log-group-name "/aws/containerinsights/my-cluster/application"
aws logs create-log-group --log-group-name "/aws/containerinsights/my-cluster/dataplane"
aws logs create-log-group --log-group-name "/aws/containerinsights/my-cluster/host"

Set retention to 30 days for application and dataplane logs, and 14 days for host logs:

aws logs put-retention-policy \
  --log-group-name "/aws/containerinsights/my-cluster/application" \
  --retention-in-days 30

aws logs put-retention-policy \
  --log-group-name "/aws/containerinsights/my-cluster/dataplane" \
  --retention-in-days 30

aws logs put-retention-policy \
  --log-group-name "/aws/containerinsights/my-cluster/host" \
  --retention-in-days 14

For the control plane log group, 90 days is a reasonable default since audit logs are often needed for compliance reviews:

aws logs put-retention-policy \
  --log-group-name "/aws/eks/my-cluster/cluster" \
  --retention-in-days 90

Verify the retention policies are set correctly:

aws logs describe-log-groups \
  --log-group-name-prefix "/aws/containerinsights/my-cluster" \
  --query "logGroups[*].[logGroupName,retentionInDays]" \
  --output table

The output shows each log group with its retention setting:

----------------------------------------------------------------------
|                        DescribeLogGroups                           |
+------------------------------------------------------------+------+
|  /aws/containerinsights/my-cluster/application             |  30  |
|  /aws/containerinsights/my-cluster/dataplane               |  30  |
|  /aws/containerinsights/my-cluster/host                    |  14  |
+------------------------------------------------------------+------+

Step 6: CloudWatch Insights Query Examples for EKS

Here are practical queries you will use regularly when debugging EKS workloads. Run these in the CloudWatch Log Insights console or via the CLI.

Find error logs from a specific pod

When a specific pod is misbehaving, filter by pod name and look for errors:

fields @timestamp, log
| filter kubernetes.pod_name like /my-app/
| filter log like /error|Error|ERROR|exception|Exception/
| sort @timestamp desc
| limit 100

Count log events by namespace

Identify which namespaces generate the most log volume – useful for finding noisy services that inflate costs:

stats count(*) as logCount by kubernetes.namespace_name
| sort logCount desc

Track pod restarts and OOMKilled events

Search dataplane logs for kubelet events showing containers being killed or restarted:

fields @timestamp, @message
| filter @message like /OOMKilled|CrashLoopBackOff|BackOff/
| sort @timestamp desc
| limit 50

Audit who deleted a resource

When something disappears from your cluster, check the control plane audit logs. This query searches for delete operations on deployments:

fields @timestamp, user.username, objectRef.name, objectRef.namespace, verb
| filter verb = "delete"
| filter objectRef.resource = "deployments"
| sort @timestamp desc
| limit 20

Find failed API authentication attempts

Search authenticator logs for denied requests – critical for security monitoring:

fields @timestamp, @message
| filter @logStream like /authenticator/
| filter @message like /Unauthorized|Forbidden|denied/
| sort @timestamp desc
| limit 50

Step 7: Set Up CloudWatch Alarms from EKS Logs

Alarms turn your logs into actionable alerts. Instead of watching dashboards, you get notified when something goes wrong. The pattern is: create a metric filter that matches a log pattern, then attach an alarm to that metric.

Create a metric filter for application errors

This metric filter counts the number of ERROR-level log entries across all application pods:

aws logs put-metric-filter \
  --log-group-name "/aws/containerinsights/my-cluster/application" \
  --filter-name "EKSAppErrors" \
  --filter-pattern "ERROR" \
  --metric-transformations \
    metricName=ApplicationErrorCount,metricNamespace=EKS/Logs,metricValue=1,defaultValue=0

Create an alarm on the error metric

This alarm fires when more than 50 errors occur within 5 minutes. Replace the SNS topic ARN with your own notification target:

aws cloudwatch put-metric-alarm \
  --alarm-name "EKS-High-Error-Rate" \
  --metric-name ApplicationErrorCount \
  --namespace EKS/Logs \
  --statistic Sum \
  --period 300 \
  --threshold 50 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:eks-alerts \
  --treat-missing-data notBreaching

Alert on OOMKilled events

OOMKilled containers often indicate memory limits are too low. Create a metric filter and alarm to catch these early:

aws logs put-metric-filter \
  --log-group-name "/aws/containerinsights/my-cluster/dataplane" \
  --filter-name "OOMKilledEvents" \
  --filter-pattern "OOMKilled" \
  --metric-transformations \
    metricName=OOMKilledCount,metricNamespace=EKS/Logs,metricValue=1,defaultValue=0

aws cloudwatch put-metric-alarm \
  --alarm-name "EKS-OOMKilled-Alert" \
  --metric-name OOMKilledCount \
  --namespace EKS/Logs \
  --statistic Sum \
  --period 300 \
  --threshold 3 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:eks-alerts \
  --treat-missing-data notBreaching

To see existing alarms for your EKS logging metrics:

aws cloudwatch describe-alarms \
  --alarm-name-prefix "EKS-" \
  --query "MetricAlarms[*].[AlarmName,StateValue]" \
  --output table

If you have Prometheus deployed on your EKS cluster, you can use CloudWatch alarms alongside Prometheus alerting rules for defense in depth – CloudWatch catches infrastructure-level issues while Prometheus handles application-specific metrics.

Step 8: Fluent Bit vs Fluentd – Which to Use on EKS

AWS officially recommends Fluent Bit over Fluentd for EKS logging. Here is a direct comparison to help you decide if you are migrating from an existing Fluentd setup.

FeatureFluent BitFluentd
LanguageCRuby + C
Memory usage~20-30 MB per pod~100-200 MB per pod
CPU usageLower – efficient for high-throughputHigher – Ruby GC overhead
Plugin ecosystemSmaller but covers core use casesLarger – 700+ community plugins
AWS supportDefault for Container InsightsLegacy – still works but not recommended
Startup timeSub-secondSeveral seconds
ConfigurationINI-style (simple)XML-style (more complex)
Multi-line parsingBuilt-inPlugin required

Use Fluent Bit for new deployments – it handles everything most EKS clusters need. The only reason to stick with Fluentd is if you rely on specific community plugins that Fluent Bit does not support yet. If you want to explore forwarding logs to Elasticsearch instead, see our guide on forwarding Kubernetes logs to Elasticsearch using Fluent Bit.

If you are currently running Fluentd, uninstall it before deploying Fluent Bit to avoid duplicate log processing:

kubectl delete daemonset fluentd-cloudwatch -n amazon-cloudwatch

Step 9: Cost Optimization for CloudWatch EKS Logging

CloudWatch logging costs come from three dimensions: ingestion ($0.50/GB), storage ($0.03/GB/month), and queries ($0.005/GB scanned). On a busy cluster with dozens of microservices, this adds up fast. Here are concrete ways to reduce costs without losing visibility.

Set aggressive retention policies

Most teams never look at logs older than 30 days. Set retention accordingly – you already configured this in Step 5. For production clusters, 30 days for application logs and 90 days for audit logs covers most incident investigations and compliance requirements.

Filter out noisy logs at the source

Health check endpoints and readiness probes can generate thousands of log lines per minute. Filter them in the Fluent Bit ConfigMap before they reach CloudWatch:

[FILTER]
    Name    grep
    Match   application.*
    Exclude log /health

[FILTER]
    Name    grep
    Match   application.*
    Exclude log /ready

Disable control plane log types you do not need

Not every cluster needs all five control plane log types. A reasonable minimum is audit and authenticator for security. The scheduler and controller manager logs are rarely needed outside troubleshooting sessions. Disable the rest:

eksctl utils update-cluster-logging \
  --cluster my-cluster \
  --region us-east-1 \
  --enable-types audit,authenticator \
  --disable-types api,controllerManager,scheduler \
  --approve

Use log level filtering in your applications

The cheapest log line is the one that never gets written. Configure your applications to log at WARN or ERROR level in production instead of DEBUG or INFO. This single change often reduces log volume by 80% or more.

Monitor log ingestion costs

Set up a CloudWatch alarm on the IncomingBytes metric to catch unexpected log volume spikes before they become expensive surprises:

aws cloudwatch put-metric-alarm \
  --alarm-name "CloudWatch-Log-Ingestion-High" \
  --namespace "AWS/Logs" \
  --metric-name IncomingBytes \
  --dimensions Name=LogGroupName,Value=/aws/containerinsights/my-cluster/application \
  --statistic Sum \
  --period 3600 \
  --threshold 1073741824 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:eks-alerts \
  --treat-missing-data notBreaching

That threshold of 1073741824 bytes (1 GB) per hour triggers the alarm. Adjust based on your normal ingestion rate. You can check your current rate with CloudWatch Container Insights dashboards or the Kubernetes Metrics Server for resource-level monitoring.

Conclusion

You now have a complete CloudWatch logging pipeline for your EKS cluster – control plane logs for cluster operations visibility, Fluent Bit shipping application and node logs, Insights queries for fast debugging, and alarms for proactive alerting. The combination covers every layer from the Kubernetes API server down to individual container output.

For production hardening, enable encryption on your CloudWatch log groups with a KMS key, export critical logs to S3 for long-term archival, and integrate CloudWatch alarms with your incident management system (PagerDuty, OpsGenie, or Slack). Review your log retention and filter settings quarterly to keep costs aligned with actual usage.

Related Articles

Openshift How To Install ArgoCD on Kubernetes / OpenShift AlmaLinux How To Run Rocky / AlmaLinux Container in Kubernetes Openstack How To Create Projects, Users, and Roles in OpenStack Cloud Bare Metal vs. VM-based Kubernetes Clusters

Leave a Comment

Press ESC to close