Containers

Deploy Apache Kafka with Docker Compose: Single Node and Multi-Broker Cluster

Kafka 4.0 killed ZooKeeper. The entire ZooKeeper dependency that made Kafka deployments painful for years is gone, replaced by KRaft (Kafka Raft) mode built directly into the broker. That means spinning up Kafka in Docker is now dramatically simpler: one container, one config, no coordination service.

Original content from computingforgeeks.com - post 165373

This guide walks through deploying Apache Kafka with Docker Compose in KRaft mode, covering both a single-node setup for development and a 3-broker cluster for staging or production workloads. You’ll also find tested Python client examples, Schema Registry integration, and JMX monitoring configuration. If you’ve been putting off containerized Kafka because of ZooKeeper complexity, that excuse no longer applies.

Tested April 2026 on Ubuntu 24.04 LTS with Kafka 4.2.0, Docker 29.4.0, Docker Compose v5.1.1

Prerequisites

Before starting, make sure your system has:

  • Docker Engine 24+ and Docker Compose v2 installed. Follow Install Docker on Ubuntu, Debian, or Rocky Linux/AlmaLinux if you haven’t set these up yet. See also the Docker Compose guide for fundamentals
  • 2 GB RAM minimum for a single-node Kafka instance, 4 GB+ for the 3-broker cluster
  • Ports 9092 and 9093 available on the host (9092 for client connections, 9093 for the controller)
  • Basic familiarity with Docker Compose and YAML syntax

Verify your Docker installation:

docker --version
docker compose version

The output should confirm both components are installed:

Docker version 29.4.0, build b4526ef
Docker Compose version v5.1.1

Single-Node Kafka for Development

A single Kafka broker in KRaft mode is all you need for local development and testing. This container runs both the broker and controller roles, eliminating any external dependencies.

Create a project directory:

mkdir -p ~/kafka-docker && cd ~/kafka-docker

Open a new file called docker-compose.yml:

vi docker-compose.yml

Add the following configuration:

services:
  kafka:
    image: apache/kafka:4.2.0
    container_name: kafka
    ports:
      - "9092:9092"
      - "9093:9093"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_LOG_DIRS: /var/lib/kafka/data
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
    volumes:
      - kafka-data:/var/lib/kafka/data
    healthcheck:
      test: ["CMD-SHELL", "/opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server localhost:9092 || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

volumes:
  kafka-data:

A few things worth noting about this configuration. The KAFKA_PROCESS_ROLES: broker,controller setting makes this a combined node that handles both data and cluster coordination. The CLUSTER_ID is a base64-encoded string that uniquely identifies the cluster. The healthcheck uses Kafka’s built-in API versions script to confirm the broker is accepting connections before marking the container as healthy.

Start the container:

docker compose up -d

Watch the container status until the health check passes:

docker compose ps

After about 15 seconds, the status should show healthy:

NAME    IMAGE                COMMAND                  SERVICE   CREATED          STATUS                    PORTS
kafka   apache/kafka:4.2.0   "/__cacert_entrypoin…"   kafka     15 seconds ago   Up 15 seconds (healthy)   0.0.0.0:9092-9093->9092-9093/tcp

The apache/kafka:4.2.0 image weighs in at 676 MB. Once running, the single broker uses approximately 317 MiB of memory, which is reasonable for development.

Test with Topics and Messages

Create a test topic to verify the broker is working correctly:

docker exec kafka /opt/kafka/bin/kafka-topics.sh \
  --create \
  --topic test-topic \
  --partitions 3 \
  --replication-factor 1 \
  --bootstrap-server localhost:9092

You should see the topic creation confirmed:

Created topic test-topic.

Produce a few messages to the topic:

echo -e "message-1\nmessage-2\nmessage-3" | docker exec -i kafka /opt/kafka/bin/kafka-console-producer.sh \
  --topic test-topic \
  --bootstrap-server localhost:9092

Now consume those messages from the beginning:

docker exec kafka /opt/kafka/bin/kafka-console-consumer.sh \
  --topic test-topic \
  --from-beginning \
  --max-messages 3 \
  --bootstrap-server localhost:9092

All three messages should appear in the output:

message-1
message-2
message-3
Processed a total of 3 messages

The single-node setup is ready for development. For anything beyond local testing, you’ll want multiple brokers.

3-Broker Kafka Cluster

A multi-broker cluster provides fault tolerance and higher throughput. With three brokers, you can tolerate the loss of one node while maintaining full availability for topics configured with replication factor 3 and min.insync.replicas=2.

Create a separate Compose file for the cluster:

vi docker-compose-cluster.yml

Add the full 3-broker configuration:

services:
  kafka1:
    image: apache/kafka:4.2.0
    container_name: kafka1
    ports:
      - "19092:9092"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka1:9092
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka1:9093,2@kafka2:9093,3@kafka3:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 3
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 2
      KAFKA_DEFAULT_REPLICATION_FACTOR: 3
      KAFKA_MIN_INSYNC_REPLICAS: 2
      KAFKA_LOG_DIRS: /var/lib/kafka/data
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
    volumes:
      - kafka1-data:/var/lib/kafka/data
    healthcheck:
      test: ["CMD-SHELL", "/opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server localhost:9092 || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

  kafka2:
    image: apache/kafka:4.2.0
    container_name: kafka2
    ports:
      - "29092:9092"
    environment:
      KAFKA_NODE_ID: 2
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka2:9092
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka1:9093,2@kafka2:9093,3@kafka3:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 3
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 2
      KAFKA_DEFAULT_REPLICATION_FACTOR: 3
      KAFKA_MIN_INSYNC_REPLICAS: 2
      KAFKA_LOG_DIRS: /var/lib/kafka/data
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
    volumes:
      - kafka2-data:/var/lib/kafka/data
    healthcheck:
      test: ["CMD-SHELL", "/opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server localhost:9092 || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

  kafka3:
    image: apache/kafka:4.2.0
    container_name: kafka3
    ports:
      - "39092:9092"
    environment:
      KAFKA_NODE_ID: 3
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka3:9092
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka1:9093,2@kafka2:9093,3@kafka3:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 3
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 2
      KAFKA_DEFAULT_REPLICATION_FACTOR: 3
      KAFKA_MIN_INSYNC_REPLICAS: 2
      KAFKA_LOG_DIRS: /var/lib/kafka/data
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
    volumes:
      - kafka3-data:/var/lib/kafka/data
    healthcheck:
      test: ["CMD-SHELL", "/opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server localhost:9092 || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

volumes:
  kafka1-data:
  kafka2-data:
  kafka3-data:

The critical detail here is that KAFKA_CONTROLLER_QUORUM_VOTERS must list all three nodes on every broker, and the CLUSTER_ID must be identical across all of them. Each broker gets a unique KAFKA_NODE_ID and its own advertised listener hostname.

Start the cluster:

docker compose -f docker-compose-cluster.yml up -d

Wait for all three brokers to become healthy, then check their status:

docker compose -f docker-compose-cluster.yml ps

All three containers should report healthy:

NAME     IMAGE                COMMAND                  SERVICE   CREATED          STATUS                    PORTS
kafka1   apache/kafka:4.2.0   "/__cacert_entrypoin…"   kafka1    45 seconds ago   Up 44 seconds (healthy)   0.0.0.0:19092->9092/tcp
kafka2   apache/kafka:4.2.0   "/__cacert_entrypoin…"   kafka2    45 seconds ago   Up 44 seconds (healthy)   0.0.0.0:29092->9092/tcp
kafka3   apache/kafka:4.2.0   "/__cacert_entrypoin…"   kafka3    45 seconds ago   Up 44 seconds (healthy)   0.0.0.0:39092->9092/tcp

Each broker consumes approximately 310 to 328 MiB of memory, so the full cluster uses roughly 1 GB total.

Verify Cluster Replication

Create a topic with replication factor 3 to confirm data is distributed across all brokers:

docker exec kafka1 /opt/kafka/bin/kafka-topics.sh \
  --create \
  --topic cluster-test \
  --partitions 6 \
  --replication-factor 3 \
  --bootstrap-server kafka1:9092

Describe the topic to see partition distribution:

docker exec kafka1 /opt/kafka/bin/kafka-topics.sh \
  --describe \
  --topic cluster-test \
  --bootstrap-server kafka1:9092

The output confirms that leaders are distributed across all three nodes, and every partition has a full ISR (in-sync replica) set:

Topic: cluster-test	TopicId: xYz1234AbCdEfGhIjKlMnO	PartitionCount: 6	ReplicationFactor: 3	Configs:
	Topic: cluster-test	Partition: 0	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
	Topic: cluster-test	Partition: 1	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
	Topic: cluster-test	Partition: 2	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
	Topic: cluster-test	Partition: 3	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
	Topic: cluster-test	Partition: 4	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
	Topic: cluster-test	Partition: 5	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1

Leaders are spread evenly across nodes 1, 2, and 3. The ISR lists match the replica lists, which means all brokers are fully synchronized. If you stop one broker, the remaining two will continue serving reads and writes for any topic with min.insync.replicas=2.

Connecting Applications to Kafka

The real test of any Kafka deployment is connecting application code. Here’s a Python example using the kafka-python-ng library (a maintained fork of the original kafka-python).

Install the client library:

pip install kafka-python-ng

Producer Example

This producer sends messages to the test-topic topic on the single-node instance:

from kafka import KafkaProducer
import json
import time

producer = KafkaProducer(
    bootstrap_servers='localhost:9092',
    value_serializer=lambda v: json.dumps(v).encode('utf-8'),
    acks='all',
    retries=3
)

start = time.time()
for i in range(10000):
    producer.send('test-topic', {'event': f'test-{i}', 'timestamp': time.time()})

producer.flush()
elapsed = time.time() - start
print(f"Sent 10000 messages in {elapsed:.2f}s ({10000/elapsed:.0f} msg/s)")

On our test setup, this producer achieved approximately 12,954 messages per second with acks=all and JSON serialization.

Consumer Example

The consumer reads from the same topic:

from kafka import KafkaConsumer
import json
import time

consumer = KafkaConsumer(
    'test-topic',
    bootstrap_servers='localhost:9092',
    auto_offset_reset='earliest',
    group_id='test-group',
    value_deserializer=lambda m: json.loads(m.decode('utf-8')),
    consumer_timeout_ms=5000
)

count = 0
start = time.time()
for message in consumer:
    count += 1

elapsed = time.time() - start
print(f"Consumed {count} messages in {elapsed:.2f}s ({count/elapsed:.0f} msg/s)")
consumer.close()

Consumer throughput measured 1,484 messages per second with JSON deserialization. The difference between producer and consumer throughput is normal because the consumer deserializes each message and handles offset commits.

Adding Schema Registry (Optional)

For production workloads where multiple teams produce and consume messages, a Schema Registry enforces data contracts. The Confluent Schema Registry works with the official Kafka image without any compatibility issues.

Add this service to your existing docker-compose.yml (single-node) or docker-compose-cluster.yml:

  schema-registry:
    image: confluentinc/cp-schema-registry:7.9.0
    container_name: schema-registry
    ports:
      - "8081:8081"
    environment:
      SCHEMA_REGISTRY_HOST_NAME: schema-registry
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: kafka:9092
      SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
    depends_on:
      kafka:
        condition: service_healthy

For the cluster setup, change SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS to kafka1:9092,kafka2:9092,kafka3:9092 and update the depends_on block to reference kafka1.

After starting the updated Compose stack, verify the registry is running:

curl -s http://localhost:8081/subjects | python3 -m json.tool

An empty list ([]) confirms the registry is up and connected to Kafka. Register a test schema:

curl -s -X POST http://localhost:8081/subjects/test-value/versions \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d '{"schema": "{\"type\": \"record\", \"name\": \"Test\", \"fields\": [{\"name\": \"id\", \"type\": \"int\"}, {\"name\": \"name\", \"type\": \"string\"}]}"}'

The response includes the schema ID:

{"id":1}

Retrieve it back to confirm persistence:

curl -s http://localhost:8081/subjects/test-value/versions/1 | python3 -m json.tool

The registry stores schemas in an internal Kafka topic (_schemas), so they survive container restarts as long as the Kafka data volume persists.

Monitoring with JMX

Kafka exposes detailed metrics through JMX (Java Management Extensions). To make these accessible from outside the container, add the following environment variables to your Kafka service:

    environment:
      KAFKA_JMX_PORT: 9999
      KAFKA_JMX_HOSTNAME: localhost
      KAFKA_JMX_OPTS: "-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"

Also expose port 9999 in your ports section:

    ports:
      - "9092:9092"
      - "9093:9093"
      - "9999:9999"

The standard monitoring stack for Kafka in production is Prometheus with the JMX Exporter agent, feeding dashboards in Grafana. The JMX Exporter runs as a Java agent inside the Kafka JVM and exposes metrics in Prometheus format on an HTTP endpoint. For a complete walkthrough of that setup, see Monitor Kafka with Prometheus and Grafana.

Key metrics to watch include kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec for throughput, kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions for replication health, and kafka.controller:type=KafkaController,name=ActiveControllerCount to verify exactly one controller is active in the cluster.

Production Considerations

Running Kafka in Docker for development is straightforward. Moving to production requires attention to several areas that directly affect reliability and performance.

JVM Heap Sizing. Kafka’s default heap is often too small for sustained workloads. Set KAFKA_HEAP_OPTS: "-Xmx2g -Xms2g" as a starting point for production brokers. The JVM also needs headroom for page cache (which Kafka relies on heavily for read performance), so a broker with 2 GB heap should have at least 4 GB total RAM allocated to the container.

Data Directory Strategy. Docker named volumes work fine for development, but production deployments benefit from bind mounts to specific host directories. This gives you direct control over the filesystem, makes backups simpler, and avoids the overhead of Docker’s storage driver. Replace the volumes section with something like ./data/kafka1:/var/lib/kafka/data and ensure the host directory has appropriate permissions.

Log Retention. Kafka stores messages on disk until retention limits are hit. The defaults (7 days or 1 GB per partition) may not suit your workload. Tune these with KAFKA_LOG_RETENTION_HOURS (how long to keep data) and KAFKA_LOG_SEGMENT_BYTES (how large each segment file grows before rolling). For high-throughput topics, you might also want to set KAFKA_LOG_RETENTION_BYTES to cap total disk usage per partition.

Networking. The Docker bridge network adds a small latency overhead that matters at scale. For production deployments on dedicated hosts, consider network_mode: host to eliminate the network translation layer. With host networking, set KAFKA_ADVERTISED_LISTENERS to the host’s actual IP (for example, PLAINTEXT://10.0.1.50:9092). Bridge mode is perfectly fine for development and staging environments.

Resource Limits. Without explicit limits, a misbehaving broker can consume all available host resources. Add resource constraints to each Kafka service in your Compose file:

    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 4G
        reservations:
          cpus: '1.0'
          memory: 2G

This ensures each broker gets at least 1 CPU and 2 GB, but cannot exceed 2 CPUs and 4 GB. Adjust these based on your actual throughput requirements.

Backups. Kafka’s data directory (/var/lib/kafka/data) contains both the message logs and the KRaft metadata. For consistent backups, stop the broker (or use filesystem snapshots if your storage supports it), then copy the entire data directory. On a running cluster with replication factor 3, you can take one broker offline at a time for backup without affecting availability. Never back up a running broker’s data directory without stopping it first or using an atomic snapshot, because partial copies can corrupt the log.

Kafka 4.2.0 in KRaft mode with Docker Compose gives you a clean, self-contained deployment with no external dependencies. The single-node config is ready to use for development in under a minute. The 3-broker cluster provides genuine fault tolerance with automatic leader election and data replication. Both configurations are version-pinned and reproducible, which is the whole point of containerized infrastructure. For a performance comparison between Kafka and Redpanda on identical hardware, see our Kafka vs Redpanda benchmarks.

Related Articles

Containers How To Run Grafana Server in Docker Container Containers RKE2 High Availability: 3-Node Production Cluster on Rocky Linux 10 Containers Attach multiple network interfaces to pods in Kubernetes using Multus CNI Openshift Install Project Quay Registry on OpenShift With Operator

Leave a Comment

Press ESC to close