Blog

Monitoring OpenStack Pt.2

Kamen Tarlov
Kamen Tarlov
CEO & DevOps Lead
21.07.2017
Reading time: 4 mins.
Last Updated: 20.09.2023

Table of Contents

In my previous blog post, I was discussing how to monitor RabbitMQ as a centralized message Q of OpenStack. Well, that’s quite important but the end goal of having a cloud is the instance on top of the machine. Most of you and especially the infrastructure guys who dig into monitoring will know what are the most important components to look over.

The reason to monitor is to have reasonable planning which is probably the drill in a cloud environment where you have to spawn a large number of virtual machines or containers. On the other hand, having the data in one glance is very easy to increase the reliability, uptime plan better your architecture, and identify the bottlenecks of your setup.21JULMy choice around cloud monitoring and DevOps as I mentioned in the previous post is Icinga2. I`ll try to explain in depth why and how it impressed me but let’s focus on Virtual Machine and KVM instances Monitoring. First of all, I`ll start with what is the most important that has to be monitored and define the main characteristics. Good preparation gives the best results!

HOST MONITORING

Indeed we have two perspectives that we can look from. The first one and probably quite important and interesting one is looking at Host Operating System. From this angle, we will see the KVM machine as a single process running on it. Something like:

qemu     32381     1  9 Jul14 ?        14:41:20 /usr/libexec/qemu-kvm -name instance-000000c9 -S -machine pc-i440fx-rhel7.0.0

This qemu process represents the virtual machine and all parameters are defining it. That helps, isn’t it? We need to monitor
a process and from here we can build our stats to monitor:

- VSZ - virtual memory size of process
- resident set memory size of process
- percentage of CPU
- process existence (instance name would be the best filter for that)

The nagios plugin that facilitates the job is check_procs.
For the last one is not so easy to identify which machine corresponds to which instance id. Usually, you can match this in
libvirt.conf inside the nova instances and get your instance name.
Another useful tool for checking the state of the machine is virsh. You can get your state of the machine and look for crash machines:

 Id    Name                           State
----------------------------------------------------
 39    instance-000000c8              running
 40    instance-000000c9              running
 41    instance-000000bb              running

Usually, the states of KVM machines are:

running - state where instance is running and operational
paused - in openstack terminology is suspended
inactive - stopped or shut off
crashed - error occurred when started

The best is to search for crashed and look afterward for the reason why it crashed.
virsh list | grep crashed
Another shortcut is to use the check_libvirt nagios plugin which does a lot.
The other aspect of monitoring is of course when you are looking at the guest. There are some parameters that might be
correlated with the previous monitoring but there are still main differences:

- memory utilization of the server
- cpu load and waits
- disk I/O
- disk usage

MEMORY

Let’s start them one by one. The first, quite important and not so easy to track is memory utilization.

              total        used        free      shared  buff/cache   available
Mem:       74053676    19262360      377524     3720412    54413792    50583164
Swap:        978940      107552      871388

Most of the infrastructure guys or the better ones know the “free” command. Well, what it means, lots of stat and free memory is really small relative to the others. The reason is how Linux operates with memory to save lots of Disk I/O operations. So we calculate free and cached memory as a common metric. Nice plugin for that is here.

It works really well and returns performance data for nice graphs.
Example:

check_mem.pl -f -C -w 20 -c 10

CPU LOAD AND UTILIZATION

Another really good metric to get in your dashboard and alerting is CPU load. The load can be high for lots of reasons like iowait, process exhausts the cpu etc…
You can use the check_load function that comes by default with Nagios plugins.
If you would like to monitor awaits from CPU stats you can use the module in Nagios exchange to check CPU stats. Make sure sysstat or iostat is installed on the target

server. Here is an example metrics:

check_cpu_stats.sh -w 20 -c 30

DISK I/O


Here you can use the metric to find any issues in underlying storage. Together with performance data you will gain even more statistics about how the data performs.

check_io -d sda -w 40,400,400 -c 100,700,700

DISK USAGE


Last but not least you can verify with check_disk the usage of all disks.


Well, here comes the end of the monitoring. If you are looking for any help in terms of monitoring feel free to contact us!

Leave a Reply

Your email address will not be published. Required fields are marked *

More Posts

Part 1 – The parturition of a pod In this series, we will delve deep into the intricate mechanics of Kubernetes, scrutinizing and looking into every step of the Kubernetes...
Reading
In the realm of DevOps, there is always a drive for automating, even though it’s not always possible, but at least we can try it. We all know what Ansible...
Reading
Get In Touch
ITGix provides you with expert consultancy and tailored DevOps services to accelerate your business growth.
Newsletter for
Tech Experts
Join 12,000+ business leaders and engineers who receive blogs, e-Books, and case studies on emerging technology.