Setup Elasticsearch Cluster on CentOS / Ubuntu With [Guide]

Elasticsearch is a powerful open-source, RESTful, distributed real-time search and analytics engine which provides the ability for full-text search. Elasticsearch is built on Apache Lucene and the software is freely available under the Apache 2 license. In this article, we will install an Elasticsearch Cluster on CentOS 8/7 & Ubuntu 20.04/18.04 using Ansible automation tool.

Original content from computingforgeeks.com - post 52792

This tutorial will help Linux users to install and configure a highly available multi-node Elasticsearch Cluster on CentOS 8 / CentOS 7 & Ubuntu 20.04/18.04 Linux systems. Some of the key uses of ElasticSearch are Log analytics, Search Engine, full-text search, business analytics, security intelligence, among many others.

In this setup, we will be installing Elasticsearch 7.x Cluster with the Ansible role. The role we’re using is ElasticSearch official project, and gives you flexibility of your choice.

Elasticsearch Nodes type

There are two common types of Elasticsearch nodes:

Master nodes: Responsible for the cluster-wide operations, such as management of indices and allocating data shards storage to data nodes.
Data nodes: They hold the actual shards of indexed data, and handles all CRUD, search, and aggregation operations. They consume more CPU, Memory, and I/O

Setup Requirements

Before you begin, you’ll need at least three CentOS 8/7 servers installed and updated. A user with sudo privileges or root will be required for the actions to be done. My setup is based on the following nodes structure.

Server Name	Specs	Server role
elk-master-01	16gb ram, 8vpcus	Master
elk-master-02	16gb ram, 8vpcus	Master
elk-master-03	16gb ram, 8vpcus	Master
elk-data01	32gb ram, 16vpcus	Data
elk-data02	32gb ram, 16vpcus	Data
elk-data03	32gb ram, 16vpcus	Data

NOTE:

For small environments, you can use a node for both data and master operations.

Storage Considerations (Optional)

For data nodes, it is recommended to configure storage properly with consideration for scalability. In my Lab, each Data node has a 500GB disk mounted under /data. This was configured with the commands below.

WARNING: Don’t copy and run the commands, they are just reference point.

sudo parted -s -a optimal -- /dev/sdb mklabel gpt
sudo parted -s -a optimal -- /dev/sdb mkpart primary 0% 100%
sudo parted -s -- /dev/sdb align-check optimal 1
sudo pvcreate /dev/sdb1
sudo vgcreate vg0 /dev/sdb1
sudo lvcreate -n lv01 -l+100%FREE vg0
sudo mkfs.xfs /dev/mapper/vg0-lv01
echo "/dev/mapper/vg0-lv01 /data xfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -a

Step 1: Install Ansible on Workstation

We will be using Ansible to setup Elasticsearch Cluster on CentOS 8/7. Ensure Ansible is installed in your machine for ease of administration.

On Fedora:

sudo dnf install ansible

On CentOS:

sudo yum -y install epel-release
sudo yum install ansible

RHEL 7 / RHEL 8:

### RHEL 8 ###
sudo subscription-manager repos --enable ansible-2.9-for-rhel-8-x86_64-rpms
sudo yum install ansible

### RHEL 7 ###
sudo subscription-manager repos --enable rhel-7-server-ansible-2.9-rpms
sudo yum install ansible

Ubuntu:

sudo apt update
sudo apt install software-properties-common
sudo apt-add-repository --yes --update ppa:ansible/ansible
sudo apt install ansible

For any other distribution, refer to official Ansible installation guide.

Confirm installation of Ansible in your machine by querying the version.

$ ansible --version
ansible 2.9.27
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/dist-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.17 (default, Jul  1 2022, 15:56:32) [GCC 7.5.0]

Step 2: Import Elasticsearch ansible role

After installation of Ansible, you can now import the Elasticsearch ansible role to your local system using galaxy.

$ ansible-galaxy install elastic.elasticsearch
- downloading role 'elasticsearch', owned by elastic
- downloading role from https://github.com/elastic/ansible-elasticsearch/archive/v7.17.0.tar.gz
- extracting elastic.elasticsearch to /root/.ansible/roles/elastic.elasticsearch
- elastic.elasticsearch (v7.17.0) was installed successfully

Where 7.x.y is the release version of Elasticsearch role to download. You can check the releases page for a match for Elasticsearch version you want to install.

The role will be added to the ~/.ansible/roles directory.

$ ls ~/.ansible/roles
elastic.elasticsearch

Configure your ssh with Elasticsearch cluster hosts.

vim ~/.ssh/config

This how my additional configurations looks like – update to fit your environment.

# Elasticsearch master nodes
Host elk-master01
  Hostname 192.168.10.2
  User root
Host elk-master02
  Hostname 192.168.10.3
  User root
Host elk-master03
  Hostname 192.168.10.4
  User root

# Elasticsearch worker nodes
Host elk-data01
  Hostname 192.168.10.2
  User root
Host elk-data02
  Hostname 192.168.10.3
  User root
Host elk-data03
  Hostname 192.168.10.4
  User root

Ensure you’ve copied ssh keys to all machines.

### Master nodes ###
for host in elk-master0{1..3}; do ssh-copy-id $host; done

### Worker nodes ###
for host in elk-data0{1..3}; do ssh-copy-id $host; done

Confirm you can ssh without password authentication.

$ ssh elk-master01
[root@elk-master-01 ~]#

If your private ssh key has a passphrase, save it to avoid prompt for each machine.

$ eval `ssh-agent -s` && ssh-add
Enter passphrase for /var/home/jkmutai/.ssh/id_rsa: 
Identity added: /var/home/jkmutai/.ssh/id_rsa (/var/home/jkmutai/.ssh/id_rsa)

Step 3: Create Elasticsearch Playbook and Run

Now that all the pre-requisites are configured, let’s create a Playbook file for deployment.

vim elk.yml

Mine has the contents below.

- hosts: elk-master-nodes
  roles:
    - role: elastic.elasticsearch
  vars:
    es_enable_xpack: false
    es_data_dirs:
      - "/data/elasticsearch/data"
    es_log_dir: "/data/elasticsearch/logs"
    es_java_install: true
    es_heap_size: "1g"
    es_config:
      cluster.name: "elk-cluster"
      cluster.initial_master_nodes: "192.168.10.2:9300,192.168.10.3:9300,192.168.10.4:9300"
      discovery.seed_hosts: "192.168.10.2:9300,192.168.10.3:9300,192.168.10.4:9300"
      http.port: 9200
      node.data: false
      node.master: true
      bootstrap.memory_lock: false
      network.host: '0.0.0.0'
    es_plugins:
     - plugin: ingest-attachment

- hosts: elk-data-nodes
  roles:
    - role: elastic.elasticsearch
  vars:
    es_enable_xpack: false
    es_data_dirs:
      - "/data/elasticsearch/data"
    es_log_dir: "/data/elasticsearch/logs"
    es_java_install: true
    es_config:
      cluster.name: "elk-cluster"
      cluster.initial_master_nodes: "192.168.10.2:9300,192.168.10.3:9300,192.168.10.4:9300"
      discovery.seed_hosts: "192.168.10.2:9300,192.168.10.3:9300,192.168.10.4:9300"
      http.port: 9200
      node.data: true
      node.master: false
      bootstrap.memory_lock: false
      network.host: '0.0.0.0'
    es_plugins:
      - plugin: ingest-attachment

Key notes:

Master nodes have node.master set to true and node.data set to false.
Data nodes have node.data set to true and node.master set to false.
The es_enable_xpack variable set to false for installation of ElasticSearch open source edition.
cluster.initial_master_nodes & discovery.seed_hosts point to master nodes
/data/elasticsearch/data is where Elasticsearch data shard will be stored – Recommended to be a separate partition from OS installation for performance reasons and scalability.
/data/elasticsearch/logs is where Elasticsearch logs will be stored.
The directories will be created automatically by ansible task. You only need to ensure /data is a mount point of desired data store for Elasticsearch.

For more customization options check the project’s github documentation.

Create inventory file

Create a new inventory file.

$ vim hosts
[elk-master-nodes]
elk-master01
elk-master02
elk-master03

[elk-data-nodes]
elk-data01
elk-data02
elk-data03

When all is set run the Playbook.

ansible-playbook -i hosts elk.yml

The execution should start. Just be patient as this could take some minutes.

PLAY [elk-master-nodes] ********************************************************************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************************
ok: [elk-master02]
ok: [elk-master01]
ok: [elk-master03]

TASK [elastic.elasticsearch : set_fact] ****************************************************************************************************************
ok: [elk-master02]
ok: [elk-master01]
ok: [elk-master03]

TASK [elastic.elasticsearch : os-specific vars] ********************************************************************************************************
ok: [elk-master01]
ok: [elk-master02]
ok: [elk-master03]
.......

A successful ansible execution will have output similar to below.

PLAY RECAP *********************************************************************************************************************************************
elk-data01                 : ok=38   changed=10   unreachable=0    failed=0    skipped=119  rescued=0    ignored=0   
elk-data02                 : ok=38   changed=10   unreachable=0    failed=0    skipped=118  rescued=0    ignored=0   
elk-data03                 : ok=38   changed=10   unreachable=0    failed=0    skipped=118  rescued=0    ignored=0   
elk-master01               : ok=38   changed=10   unreachable=0    failed=0    skipped=119  rescued=0    ignored=0   
elk-master02               : ok=38   changed=10   unreachable=0    failed=0    skipped=118  rescued=0    ignored=0   
elk-master03               : ok=38   changed=10   unreachable=0    failed=0    skipped=118  rescued=0    ignored=0

See below screenshot.

install elasticsearch cluster centos ubuntu

Step 4: Confirm Elasticsearch Cluster installation

ssh elk-master01

Check cluster health status.

$ curl http://localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "elk-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 6,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Check master nodes.

$ curl -XGET 'http://localhost:9200/_cat/master'
G9X__pPXScqACWO6YzGx3Q 95.216.167.173 95.216.167.173 elk-master01

View Data nodes:

$ curl -XGET 'http://localhost:9200/_cat/nodes'
192.168.10.4   7 47 1 0.02 0.03 0.02 di - elk-data03
192.168.10.2  10 34 1 0.00 0.02 0.02 im * elk-master01
192.168.10.4  13 33 1 0.00 0.01 0.02 im - elk-master03
192.168.10.3  14 33 1 0.00 0.01 0.02 im - elk-master02
192.168.10.3   7 47 1 0.00 0.03 0.03 di - elk-data02
192.168.10.2   6 47 1 0.00 0.02 0.02 di - elk-data01

As confirmed you now have a Clean Elasticsearch Cluster on CentOS 8/7 & Ubuntu 22.04/20.04/18.04 Linux system

Similar guides: