GlusterFS on K8s and OpenShift for Persistent Storage [Guide]

Welcome to our guide on setting up Persistent Volumes Dynamic Provisioning using GlusterFS and Heketi for your Kubernetes / OpenShift clusters. GlusterFS is a free and open source scalable network filesystem suitable for data-intensive tasks such as cloud storage and media streaming. It utilizes common off-the-shelf hardware. In my setup, I’ve opted to deploy GlusterFS as a hyper-converged service on the Kubernetes nodes. This will unlock the power of dynamically provisioned, persistent GlusterFS volumes in Kubernetes.

Original content from computingforgeeks.com - post 27253

We’ll use the gluster-kubernetes project which provides Kubernetes administrators a mechanism to easily deploy GlusterFS as a native storage service onto an existing Kubernetes cluster. Here, GlusterFS is managed and orchestrated like any other app in Kubernetes. heketi is a RESTful volume management interface for GlusterFS. It allows you to create and manage Gluster volumes using API.

Infrastructure Requirements

Below are the basic requirements for the setup.

There must be at least three nodes
Each node must have at least one raw block device attached for use by heketi
Each node must have the following ports opened for GlusterFS communications: 2222 for GlusterFS pod’s sshd, 24007 for GlusterFS Daemon, 24008 for GlusterFS Management, 49152 to 49251 for each brick created on the host.
The following kernel modules must be loaded:

dm_snapshot
dm_mirror
dm_thin_pool

Each node requires that the mount.glusterfs command is available.
GlusterFS client version installed on nodes should be as close as possible to the version of the server.

Step 1: Setup Kubernetes / OpenShift Cluster

This setup assumes you have a running Kubernetes / OpenShift(OKD) cluster. Refer to our guides on how to quickly spin up a cluster for test/production use.

Step 2: Install glusterfs and configure firewall

If you’re using Red Hat based Linux distribution, install the glusterfs-fuse package which provides mount.glusterfs command.

sudo yum -y install glusterfs-fuse

For Ubuntu / Debian:

sudo apt install glusterfs-client

Load all kernel modules required

for i in dm_snapshot dm_mirror dm_thin_pool; do sudo modprobe $i; done

Check if the modules are loaded.

$ sudo lsmod |  egrep 'dm_snapshot|dm_mirror|dm_thin_pool'
dm_thin_pool           66358  0 
dm_persistent_data     75269  1 dm_thin_pool
dm_bio_prison          18209  1 dm_thin_pool
dm_mirror              22289  0 
dm_region_hash         20813  1 dm_mirror
dm_log                 18411  2 dm_region_hash,dm_mirror
dm_snapshot            39103  0 
dm_bufio               28014  2 dm_persistent_data,dm_snapshot
dm_mod                124461  5 dm_log,dm_mirror,dm_bufio,dm_thin_pool,dm_snapshot

Check version installed.

$ glusterfs --version
glusterfs 3.12.2

Also open required ports on the Firewall – CentOS / RHEL / Fedora

for i in 2222 24007 24008 49152-49251; do
  sudo firewall-cmd --add-port=${i}/tcp --permanent
done
sudo firewall-cmd --reload

Step 3: Check Kubernetes Cluster status

Verify the Kubernetes installation by making sure all nodes in the cluster are Ready:

$ kubectl  get nodes
NAME       STATUS   ROLES    AGE    VERSION
master01   Ready    master   146m   v1.26.5
worker01   Ready    <none>   146m   v1.26.5
worker02   Ready    <none>   146m   v1.26.5
worker03   Ready    <none>   146m   v1.26.5

To view the exact version of Kubernetes running, use:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"fcf512e2763f3b98bcc8e3fb087cd8cb80f8ca83", GitTreeState:"clean", BuildDate:"2022-08-15T05:48:10Z", GoVersion:"go1.18.4", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.5", GitCommit:"890a139214b4de1f01543d15003b5bda71aae9c7", GitTreeState:"clean", BuildDate:"2023-05-17T14:08:49Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"}

Step 4: Add secondary raw disks to your nodes

Each node must have at least one raw block device attached for use by heketi. I’ve added 2 virtual disks of 50gb each to my k8s nodes.

[worker01 ~]$ lsblk 
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda    253:0    0  20G  0 disk 
└─vda1 253:1    0  20G  0 part /
vdc    253:32   0   50G 0 disk 
vdd    253:48   0   50G 0 disk 

[worker02 ~]$ lsblk 
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda    253:0    0  20G  0 disk 
└─vda1 253:1    0  20G  0 part /
vdc    253:32   0   50G 0 disk 
vdd    253:48   0   50G 0 disk 

[worker03 ~]$ lsblk 
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda    253:0    0  20G  0 disk 
└─vda1 253:1    0  20G  0 part /
vdc    253:32   0   50G 0 disk 
vdd    253:48   0   50G 0 disk

Step 5: Create a topology file

You must provide the GlusterFS cluster topology information which describes the nodes present in the GlusterFS cluster and the block devices attached to them for use by heketi.

Since I’m running all operations on the Kubernetes master node, Let’s pull gluster-kubernetes from Github.

sudo yum -y install git vim
git clone https://github.com/gluster/gluster-kubernetes.git

Copy and edit the topology information template.

cd gluster-kubernetes/deploy/
cp topology.json.sample topology.json

This is what I have in my configuration.

{
  "clusters": [
    {
      "nodes": [
        {
          "node": {
            "hostnames": {
              "manage": [
                "worker01"
              ],
              "storage": [
                "10.10.1.193"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/vdc",
            "/dev/vdd"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "worker02"
              ],
              "storage": [
                "10.10.1.167"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/vdc",
            "/dev/vdd"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "worker03"
              ],
              "storage": [
                "10.10.1.178"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/vdc",
            "/dev/vdd"
          ]
        }
      ]
    }
  ]
}

When creating your own topology file:

Make sure the topology file only lists block devices intended for heketi’s use. heketi needs access to whole block devices (e.g. /dev/vdc, /dev/vdd) which it will partition and format.
The hostnames array is a bit misleading. manage should be a list of hostnames for the node, but storage should be a list of IP addresses on the node for backend storage communications.

Step 6: Run the deployment script

With the topology file created, you are ready to run the gk-deploy script from a machine with administrative access to your Kubernetes cluster. If not running from the master node, copy the Kubernetes configuration file to ~/.kube/config.

Familiarize yourself with the options available.

./gk-deploy -h

Common options:

-g, --deploy-gluster: Deploy GlusterFS pods on the nodes in the topology that contain brick devices
--ssh-user USER: User to use for SSH commands to GlusterFS nodes. Non-root users must have sudo permissions on the nodes. Default is 'root'
--user-key USER_KEY: Secret string for general heketi users. This is a                                                                  required argument.
-l LOG_FILE, --log-file LOG_FILE: Save all output to the specified file.
-v, --verbose: Verbose output

Run the command below to start the deployment of GlusterFS/Heketi replacing MyUserStrongKey and MyAdminStrongKey with your key values.

./gk-deploy -g \
 --user-key MyUserStrongKey \
 --admin-key MyAdminStrongKey \
 -l /tmp/heketi_deployment.log \
 -v topology.json

Press Y key to accept the installation.

Do you wish to proceed with deployment?

[Y]es, [N]o? [Default: Y]: Y

If the deployment was successful, you should get a message:

heketi is now running and accessible via http://10.233.108.5:8080

See screenshot below.

Pods, service and endpoint will be created automatically upon successful deployment. The GlusterFS and heketi should now be installed and ready to go.

$ kubectl  get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
heketi   1/1     1            1           75m

$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
glusterfs-44jvh          1/1     Running   0          110m
glusterfs-j56df          1/1     Running   0          110m
glusterfs-lttb5          1/1     Running   0          110m
heketi-b4b94d59d-bqmpz   1/1     Running   0          76m

$ kubectl  get services
NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
heketi                     ClusterIP   10.233.42.58    <none>        8080/TCP   76m
heketi-storage-endpoints   ClusterIP   10.233.41.189   <none>        1/TCP      76m
kubernetes                 ClusterIP   10.233.0.1      <none>        443/TCP    127m

$ kubectl get endpoints
NAME                       ENDPOINTS                                   AGE
heketi                     10.233.108.5:8080                           76m
heketi-storage-endpoints   10.10.1.167:1,10.10.1.178:1,10.10.1.193:1   77m
kubernetes                 10.10.1.119:6443                            127m

Step 7: Install heketi-cli to interact with GlusterFS

The heketi-cli is used to interact with GlusterFS deployed on the Kubernetes cluster. Download the latest release and place the binary in your PATH.

wget https://github.com/heketi/heketi/releases/download/v10.4.0/heketi-client-v10.4.0-release-10.linux.amd64.tar.gz

Extract downloaded archive files – This will have both client and server.

for i in `ls | grep heketi | grep .tar.gz`; do tar xvf $i; done

Copy heketi-cli to /usr/local/bin directory.

sudo cp ./heketi-client/bin/heketi-cli /usr/local/bin

You should be able to get heketi-cli version as any user logged in to the server.

$ heketi-cli --version
heketi-cli v10.4.0-release-10

You can set the HEKETI_CLI_SERVER environment variable for the heketi-cli to read it directly.

export HEKETI_CLI_SERVER=$(kubectl get svc/heketi --template 'http://{{.spec.clusterIP}}:{{(index .spec.ports 0).port}}')

Confirm variable value:

$ echo $HEKETI_CLI_SERVER
http://10.233.108.5:8080

Query cluster details

$ heketi-cli cluster list --user admin --secret  MyAdminStrongKey
Clusters:
Id:88ed1913182f880ab5eb22ca2f904615 [file][block]

$ heketi-cli cluster info 88ed1913182f880ab5eb22ca2f904615
Cluster id: 88ed1913182f880ab5eb22ca2f904615
Nodes:
1efe9a69341b50b00a0b15f6e7d8c797
2d48f05c7d7d8d1e9f4b4963ef8362e3
cf5753b191eca0b67aa48687c08d4e12
Volumes:
e06893fc6e4f5fa23994432a40877889
Block: true

File: true

If you save Heketi admin username and key as environment variables, you don’t need to pass these options.

$ export HEKETI_CLI_USER=admin
$ export HEKETI_CLI_KEY=MyAdminStrongKey
$ heketi-cli cluster list
Clusters:
Id:5c94db92049afc5ec53455d88f55f6bb [file][block]

$ heketi-cli cluster info 5c94db92049afc5ec53455d88f55f6bb
Cluster id: 5c94db92049afc5ec53455d88f55f6bb
Nodes:
3bd2d62ea6b8b8c87ca45037c7080804
a795092bad48ed91be962c6a351cbf1b
e98fd47bb4811f7c8adaeb572ca8823c
Volumes:
119c23455c894c33e968a1047b474af2
Block: true

File: true

$  heketi-cli node list
Id:75b2696a9e142e6900ee9fd2d1eb56b6     Cluster:23800e4b6bdeebaec4f6c45b17cabf55
Id:9ca47f98eaa60f0e734ab628897160fc     Cluster:23800e4b6bdeebaec4f6c45b17cabf55
Id:c43023282eef0f10d4109c68bcdf0f9d     Cluster:23800e4b6bdeebaec4f6c45b17cabf55

View topology info:

$ heketi-cli topology info
Cluster Id: 698754cfaf9642b451c4671f96c46a0b

    File:  true
    Block: true

    Volumes:


    Nodes:

        Node Id: 39e8fb3b09ccfe47d1d3f2d8e8b426c8
        State: online
        Cluster Id: 698754cfaf9642b451c4671f96c46a0b
        Zone: 1
        Management Hostnames: worker03
        Storage Hostnames: 10.10.1.178
        Devices:

        Node Id: b9c3ac6737d27843ea0ce69a366de48c
        State: online
        Cluster Id: 698754cfaf9642b451c4671f96c46a0b
        Zone: 1
        Management Hostnames: worker01
        Storage Hostnames: 10.10.1.193
        Devices:

        Node Id: c94636a003af0ca82e7be6962149869b
        State: online
        Cluster Id: 698754cfaf9642b451c4671f96c46a0b
        Zone: 1
        Management Hostnames: worker02
        Storage Hostnames: 10.10.1.167
        Devices:

Create StorageClass for dynamic provisioning.

$ vim gluster-storage-class.yaml 
---
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: glusterfs-storage
provisioner: kubernetes.io/glusterfs
parameters:
  resturl: "http://10.233.108.5:8080"
  restuser: "admin"
  restuserkey: "MyAdminStrongKey"

$ kubectl create -f gluster-storage-class.yaml
storageclass.storage.k8s.io/glusterfs-storage created

$ kubectl  get storageclass
NAME             PROVISIONER               AGE
glusterfs-storage   kubernetes.io/glusterfs   18s

$ kubectl  describe storageclass.storage.k8s.io/glusterfs-storage
Name:                  glusterfs-storage
IsDefaultClass:        No
Annotations:           <none>
Provisioner:           kubernetes.io/glusterfs
Parameters:            resturl=http://10.233.108.5:8080,restuser=admin,restuserkey=MyAdminStrongKey
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

Create PVC

$ cat  gluster-pvc.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: glusterpvc01
 annotations:
   volume.beta.kubernetes.io/storage-class: glusterfs-storage
spec:
 accessModes:
  - ReadWriteOnce
 resources:
   requests:
     storage: 1Gi

$ kubectl create -f gluster-pvc.yaml
persistentvolumeclaim/glusterpvc01 created

Where:

glusterfs-storage is the Kubernetes Storage Class annotation and the name of the Storage Class.
1Gi is the amount of storage requested

To learn how to use Dynamic provisioning in your deployments, check the Hello World with GlusterFS Dynamic Provisioning