Blog

Deploying an on-premise Kubernetes cluster with StorPool Storage and benchmarking performance

Deploying an on-premise Kubernetes cluster with StorPool Storage and benchmarking performance

Few words about the project 

-------------------------------------
 ITGix is proud to partner up with StorPool Storage to deliver a project of a live on-premise Kubernetes cluster using StorPool software as persistent storage.
 The project aims to demonstrate the high compatibility of StorPool Storage allowing it to be integrated to the same computational nodes running Kubernetes while also showing a high advantage over a similar storage solution. This document will act as an overview of the installation process of both systems and will provide a brief comparison with Ceph Storage.

Containers?

-------------------------------------
 We can’t talk about Kubernetes without first mentioning containers. Containers are single units that hold an application’s executable binaries, libraries, and sometimes source codes. This provides reliability - containers ensure that an application is able to run on any system since it’s shipped with everything it could need. Containers provide a dedicated environment for an application to run just like virtual machines, however, unlike virtual machines, containers do not run a fully functioning operating system within their hosts, hence strongly reduce resource demands. Containers offer a modern resource-efficient way of reliably running applications.
 Besides not running an operating system with a heavy kernel, containers are stateless. Containers are deployed and started based on an image that defines them and their content. Any changes made to a running container will be lost once it is stopped. If it gets restarted, it will run in the same state defined by the same image. In order to store any data written in or by a container, it has to be integrated with a dedicated storage system. Containers that run CMS or databases require dedicated storage for persistent data. Orchestrators like Kubernetes offer a stateful solution by providing persistent storage with CSI (Container Storage Interface) drivers and StorageClasses.

What is Kubernetes exactly?

-------------------------------------
 Kubernetes is an orchestrator, responsible for running and managing containers in a single environment. It operates as a cluster, following a Master-Worker architecture. All configuration regarding container management is done on the Master’s side using an API - kubectl, while the Worker side is responsible for running containerized applications. The configuration is done on the Master's side using YAML documents. YAML is a markup language used for data-serialization. YAML documents contain metadata in YAML format about different components in a Kubernetes cluster and get applied once they are edited in order for changes to take place.
 Containers in Kubernetes are stored inside logical units called pods that run on the Worker side. Kubernetes creates and monitors pods and depending on specific configuration patterns, it will replicate pods on the same worker node or another, create new pods once an old one has been deleted, etc.
 Kubernetes offers a solution for containers’ storage, also called persistent storage. A separate storage system can be integrated to work with a Kubernetes cluster using CSI drivers and allows containerized applications to store their data.

What is StorPool?

-------------------------------------
 StorPool is a fast, distributed block storage software, running on standard hardware, intended to provide an efficient and effective solution to storing large amounts of data. It provides distributed storage to replicate data and eliminate single points of failure while being also highly scalable.
Storpool doesn’t need dedicated nodes and can run on computation servers - in this case, Kubernetes and Storpool will run on the same machines. StorPool’s compatibility offers a big advantage over other storage solutions since Kubernetes has very specific requirements regarding system resource allocation so the usual best practice for Kubernetes is to run on its own separate dedicated nodes, apart from any other software, including distributed storage systems.

Planning

-------------------------------------
 Cloud providers offer different methods to integrate Kubernetes while following the Master-Worker model. This project, however, will focus on the deployment of a fully-functional on-premise cluster using a deployment tool called Kubespray.

Hardware

-------------------------------------
 The cluster will consist of 1 virtual machine acting as a Master and 3 physical machines being worker nodes. The workers will host any containers deployed in pods in the cluster while the master will hold all configurations.
Master:
k8s-poc-master
Workers:
k8s-poc-node4
k8s-poc-node5
k8s-poc-node6
Specifications of each worker node are listed below along with details regarding RAM allocation on each node to each software component, network hardware differences, and total physical block devices for the whole cluster.
CPUs: Xeon E5-2620 0 @ 2.00GHz
RAM: 64 GB (16 GB for StorPool, 5 GB for K8s manager VM, 6 GB for OS & users, 34 GB for K8s pods)
OS: CentOS 7
ETH: 2 x 10 Gb (2 nodes with Mellanox MT27500, 1 with Intel 82599ES)
SSD: 2 x Server Grade Intel SSD DC S3500 240 GB (6 SSDs for the whole cluster)

Deploying Kubernetes with Kubespray

-------------------------------------
 Kubespray uses this - an automation tool used to install applications and perform tasks on remote machines. Ansible scripts can be run from any host that has a network and ssh connection to the machines that you wish to run automated tasks on. Kubespray is basically a collection of Ansible roles (scripts) intended to install and deploy on-premise multi-node Kubernetes clusters.
 Ansible installs Kubernetes on desired nodes following a specifically built inventory file containing a list of the machines to be future Kubernetes nodes as well as data about their role in the cluster.
 Once the inventory file is filled with correct information and all dependencies on the remote machines are resolved, the following command will kickstart the installation and main configuration of the whole Kubernetes cluster for you:
ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml
 A detailed guide to preparing and troubleshooting Kubespray scripts needs an article of its own, as this one follows an overview of the whole deployment process.
 Successful deployment can be verified with the kubectl get nodes command.

Installing StorPool

-------------------------------------
 Installation of StorPool software is done by StorPool’s team of engineers after meeting the requirements in the pre-installation checklist.

Integrating StorPool with Kubernetes

-------------------------------------
 Any changes and configuration made to a functional Kubernetes cluster, including storage integration, can be done by applying a YAML document in the Master node as shown below.
kubectl apply -f config.yaml
 To integrate StorPool with Kubernetes it is required to have the Kubernetes StorPool module installed on all nodes, storpool_mgmt running on the Kubernetes master node and jq tool installed for parsing JSON output from the command line.
 Integration is done with a CSI driver provisioner that gets defined in a YAML document along with its ServiceAccount, ClusterRole, and ClusterRoleBinding data. This document gets applied to Kubernetes with kubectl API on the master host. Detailed guide to a StorPool-Kubernetes integration can be found at StorPool’s official documentation.
 After successful integration, StorPool can be listed as an available storage class.

End result

-------------------------------------
 Completing all configuration tasks leaves us with a working live cluster, consisting of 4 machines. All machines host both a working Kubernetes cluster and a Storpool storage cluster. These two softwares work together to provide a system for running and managing containerized applications and provide working storage for their data.
 Kubernetes ensures applications are always available and Storpool ensures the same for their data.
The 4 machines are:
k8s-poc-master
k8s-poc-node4
k8s-poc-node5
k8s-poc-node6
k8s-poc-master acts as the Kubernetes master node. This is the only Master node in this cluster as the other 3 are worker nodes.
The master holds the kubectl API which changes Kubernetes configuration, manages and sets container deployment rules, etc.
Containers get deployed in pods located on the worker nodes.
StorPool provides distributed persistent storage that allows containers running storage-dependent applications such as databases and CMS to safely store fault-tolerant data. Each worker node uses 2 Server Grade Intel SSD DC S3500 240 GB drives.
Benchmark Testing StorPool with fio/dbench
-------------------------------------
 Bench testing storage in Kubernetes in this project was done with fio and dbench. Fio – Flexible Input/Output is a free tool used to test workloads. Fio offers statistics about sequential and random read/write operations and latency. Dbench is an archive, containing a Dockerfile that builds a fio container and a YAML file with instructions to be applied to Kubernetes.
 Fio runs in Kubernetes as a container and claims a 1000 GB persistent volume from the integrated persistent storage to test its I/O performance.
Fio testing for StorPool delivers the following results (raw output):
==================
= Dbench Summary =
==================
Random Read/Write IOPS: 166k/61.4k. BW: 2176MiB/s / 466MiB/s
Average Latency (usec) Read/Write: 279.06/219.43
Sequential Read/Write: 2223MiB/s / 487MiB/s
Mixed Random Read/Write IOPS: 80.5k/26.9k

Benchmark Testing Ceph with fio/dbench

-------------------------------------

 To compare the performance of StorPool with other distributed storage software, a Ceph cluster was also used in benchmark testing with fio and dbench. The Ceph cluster is hosted on 2 dedicated servers with the following specifications:

2 x Servers:

AMD Ryzen 7 1700X Octa Core

64 GB RAM

Each server with 2X 512 GB NVMe SSD

 As shown, each Ceph server has two 512 GB NVMe SSDs while in the StorPool cluster each node has 2 x 2 Server Grade Intel SSD DC S3500 240 GB drives.

Benchmark testing was done for the Ceph block storage solution with RBD as well as with CephFS.

Raw output from these tests can be seen below:

CEPHFS

==================
= Dbench Summary =
==================
Random Read/Write IOPS: 16.2k/7379. BW: 185MiB/s / 106MiB/s
Average Latency (usec) Read/Write: 2645.18/
Sequential Read/Write: 203MiB/s / 106MiB/s
Mixed Random Read/Write IOPS: 9322/3077

CEPH RBD

==================
= Dbench Summary =
==================
Random Read/Write IOPS: 15.3k/7591. BW: 202MiB/s / 108MiB/s
Average Latency (usec) Read/Write: 2970.04/
Sequential Read/Write: 188MiB/s / 106MiB/s
Mixed Random Read/Write IOPS: 8819/2954

Comparing StorPool and Ceph Performance

-------------------------------------

 StorPool Storage shows a significant advantage in performance compared to Ceph RBD Block Storage and CephFS in sequential read/write, random read/write, bandwidth, and latency. Data from benchmark tests is displayed in the table and chart below:


Conclusion

-------------------------

 Containers offer a modern solution for running applications in a secure and fault-tolerant environment without wasting valuable system resources. Kubernetes is a highly functional container orchestrator and works with storage software systems to provide persistent storage for containerized applications.

 StorPool is a software storage system running on standard hardware that provides fault-tolerance and scalability of written data because of its distributed working model. It also proves to be very compatible with Kubernetes since it is able to run on the same computational nodes as Kubernetes – something that would hardly be possible using other storage solutions. In addition to high availability, scalability, and amazing compatibility it also shows to work extremely fast compared to other popular distributed storage systems.

 ITGix and StorPool successfully teamed up to demonstrate strong operational skills combined with amazing software development and provide a modern and efficient system that can only be used as a reference model in the world of DevOps.