In our previous guide, we saw how to install Apache Kafka on CentOS 8 and to keep on the same lane, we are going to see how to get it installed on Ubuntu 20.04 (Focal Fossa). We had covered a lot of definitions but lets us provide them here once more just to save on your time and add a little bit of convenience.
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.Source: Apache Kafka.
Let us break down what all of that means step by step. An event records the fact that “something occurred” in your business and was recorded digitally.
Event streaming is the practice of capturing data in real-time from event sources (producers) like databases, sensors, mobile devices, cloud services, and software applications in the form of streams of events; storing these event streams durably for later retrieval; manipulating, processing, and reacting to the event streams in real-time as well as retrospectively; and routing the event streams to different destination technologies (consumers) as needed. Source: Apache Kafka.
The producer is the program/application or entity that sends data to the Kafka cluster. The consumer sits on the other side and receives data from the Kafka cluster. The Kafka cluster can consist of one or more Kafka brokers which sit on different servers.
“Everything has beauty, but not everyone sees it.”
Defining other terms you will encounter
- Topic: A topic is a common name used to store and publish a particular stream of data. For example if you would wish to store all the data about a page being clicked, you can give the Topic a name such as “Added Customer“.
- Partition: Every topic is split up into partitions (“baskets”). When a topic is created, the number of partitions need to be specified but can be increased later as need arises. Each message gets stored into partitions with an incremental id known as its Offset value.
- Kafka Broker: Every server with Kafka installed in it is known as a broker. It is a container holding several topics having their partitions.
- Zookeeper: Zookeeper manages Kafka’s cluster state and configurations.
Apache Kafka Use Cases
The following are some of the applications where you can take advantage of Apache Kafka:
- Message Broking: In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications
- Website Activity Tracking
- Log Aggregation: Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages.
- Stream Processing: capturing data in real-time from event sources; storing these event streams durably for later retrieval; and routing the event streams to different destination technologies as needed
- Event Sourcing: This is a style of application design where state changes are logged as a time-ordered sequence of records.
- Commit Log: Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data.
- Metrics: This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.
Installing Apache Kafka on Ubuntu 20.04
Apache Kafka requires Java for it to run so we shall prepare our server and get every pre-requisite installed
Step 1: Preparing your Ubuntu Server
Update your fresh Ubuntu 20.04 server and get Java installed as illustrated below.
sudo apt update && sudo apt upgrade sudo apt install default-jre wget git unzip -y sudo apt install default-jdk -y
Step 2: Fetch Kafka on Ubuntu 20.04
After Java is well installed, let us now fetch Kafka sources. Head over to Downloads and look for the Latest release and get the sources under Binary downloads. Click on the one that is recommended by Kafka and you will be redirected to a page that has a link you can use to fetch it.
cd ~ wget https://downloads.apache.org/kafka/2.6.0/kafka_2.13-2.6.0.tgz sudo mkdir /usr/local/kafka-server && cd /usr/local/kafka-server sudo tar -xvzf ~/kafka_2.13-2.6.0.tgz --strip 1
Archive’s contents will be extracted into /usr/local/kafka-server/ due to –strip 1 flag set.
Step 3: Create Kafka and Zookeeper Systemd Unit Files
Systemd unit files for Kafka and Zookeeper will pretty much help in performing common service actions such as starting, stopping, and restarting Kafka. This makes it adapt to how other services are started, stopped, and restarted which is beneficial and consistent.
Let us begin with Zookeeper service:
$ sudo vim /etc/systemd/system/zookeeper.service [Unit] Description=Apache Zookeeper Server Requires=network.target remote-fs.target After=network.target remote-fs.target [Service] Type=simple ExecStart=/usr/local/kafka-server/bin/zookeeper-server-start.sh /usr/local/kafka-server/config/zookeeper.properties ExecStop=/usr/local/kafka-server/bin/zookeeper-server-stop.sh Restart=on-abnormal [Install] WantedBy=multi-user.target
Then for Kafka service. Make sure your JAVA_HOME configs are well inputted or Kafka will not start.
$ sudo vim /etc/systemd/system/kafka.service [Unit] Description=Apache Kafka Server Documentation=http://kafka.apache.org/documentation.html Requires=zookeeper.service After=zookeeper.service [Service] Type=simple Environment="JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64" ExecStart=/usr/local/kafka-server/bin/kafka-server-start.sh /usr/local/kafka-server/config/server.properties ExecStop=/usr/local/kafka-server/bin/kafka-server-stop.sh Restart=on-abnormal [Install] WantedBy=multi-user.target
After you are done adding the configurations, reload the systemd daemon to apply changes and then start the services. You can check their status as well.
sudo systemctl daemon-reload sudo systemctl enable --now zookeeper sudo systemctl enable --now kafka sudo systemctl status kafka zookeeper
Step 4: Install Cluster Manager for Apache Kafka (CMAK) | Kafka Manager
CMAK (previously known as Kafka Manager) is an opensource tool for managing Apache Kafka clusters developed by Yahoo.
cd ~ git clone https://github.com/yahoo/CMAK.git
Step 5: Configure CMAK on Ubuntu 20.04
The minimum configuration is the zookeeper hosts which are to be used for CMAK (pka kafka manager) state. This can be found in the application.conf file in conf directory. Change cmak.zkhosts=”my.zookeeper.host.com:2181″ and you can also specify multiple zookeeper hosts by comma delimiting them, like so: cmak.zkhosts=”my.zookeeper.host.com:2181,other.zookeeper.host.com:2181“. The host names can be ip addresses too.
$ vim ~/CMAK/conf/application.conf cmak.zkhosts="localhost:2181
After you are done adding your zookeeper hosts, the command below will create a zip file which can be used to deploy the application. You should see a lot of output on your terminal as files are downloaded and compiled. Give it time to finish and compile because it takes a while.
cd ~/CMAK/ ./sbt clean dist
When all is done, you should see a message like below:
[info] Your package is ready in /home/tech/CMAK/target/universal/cmak-220.127.116.11.zip
Change into the directory where the zip file is located and unzip it:
$ cd ~/CMAK/target/universal $ unzip cmak-18.104.22.168.zip $ cd cmak-22.214.171.124
Step 5: Starting the service and Accessing it
After extracting the produced zipfile, and changing the working directory to it as done in Step 4, you can run the service like this:
$ cd ~/CMAK/target/universal/cmak-126.96.36.199 $ bin/cmak
By default, it will choose port 9000, so open your favorite browser and point it to http://ip-or-domain-name-of-server:9000. In case your firewall is running, kindly allow the port to be accessed externally.
sudo ufw allow 9000
You should see an interface as shown below once everything is okay:
You will immediately notice that there is no cluster available when we first get into the interface as shown above. Therefore, we shall proceed to create a new cluster. Click on the “Cluster” drop-down list and then choose “Add Cluster“.
You will be presented with a page as shown below. Fill in the form with the details being requested (Cluster Name, Zookeeper Hosts etc). In case you have several Zookeeper Hosts, add them delimited by a comma. You can fill in the other details depending on your needs.
Sample filled fields
After everything is well filled to your satisfaction, scroll down and hit “Save“.
Step 6: Adding Sample Topic
Apache Kafka provides multiple shell scripts to work with. Let us first create a sample topic called “ComputingForGeeksTopic” with a single partition with single replica. Open a new terminal leaving CMAK running and issue the command below:
cd /usr/local/kafka-server bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic ComputingForGeeksTopic Created topic ComputingForGeeksTopic.
Confirm whether the topic is updated in CMAK interface. To do this, whilst in your cluster, Click Topic>List
Step 7: Create Topic in CMAK interface
Another simpler way of creating a Topic is via the CMAK web interface. Simply click on “Topic” drop-down list and click on “Create“. This is illustrated below. You will be required to input all the details you need about the new Topic (Replication Factor, Partitions and others). Fill in the form then click “Create” below it.
You will be required to input all the details you need about the new Topic (Replication Factor, Partitions and others). Fill in the form then click “Create” below the page you will be presented with as illustrated below:
You will see a message that your topic was created as shown below. A link to view it will be availed as well. Click on it to view your new Topic
And there it is:
Apache Kafka is now installed on Ubuntu 20.04 server. It should be noted that it is possible to install Kafka on multi-servers to create a cluster. Otherwise, thank you for visiting and staying tuned till the end. We appreciate the support that you continue to give us.
Find out more about Apache Kafka
Find out more about Cluster Manager for Apache Kafka
Find other amazing guides below: