Blog

Monitoring Openstack Part 1

Monitoring Openstack Part 1

Last year we focused on the Openstack technology and the projects behind it. We decided to stress on it and move our scope in that direction because of the rich features and flexibility that it provides. But as we know great power comes with great responsibility.


Once you are done with the installation and configuration of the cloud you need to guarantee its availability and this is where monitoring comes. If you browse the web you might find numerous technologies that can fit the need - Zabbix, Nagios and Icinga are just few of them. So it’s really easy to choose one that will do the job.

My choice of monitoring is Icinga2/ Icinga and here are some points that helped in choosing it:

- Icinga2 support Grapite natively
- RESTful API support
- Rich web experience
- Support of Nagios modules

Once we select the monitoring technology we can focus on the monitoring itself. Openstack provides an Infrastructure-as-a-Service through variety of projects or services:

- Nova
- Keystone
- Glance
- Horizon
- Neutron
- Cinder
- Swift
... and many more.

Moreover, Openstack services communicate through the AMQP technology which was selected by Openstack. AMQP broker of the default installation is RabbitMQ but it can be substituted by Qpid if needed. All communications for most of the components like Nova, Cinder, Neutron use Remote Procedure Calls to communicate and that makes message broker the heart of the cloud. Tasks like creating new volumes, starting new instances, stopping new instances and more. So monitoring RabbitMQ is like checking you heart rate regularly.

Once you have installed the Icinga or Nagios monitoring you can focus on looking for modules. Here you can find the Nagios module we use to monitor:

https://github.com/nagios-plugins-rabbitmq/nagios-plugins-rabbitmq

Before you go with it you must enable the rabbit management console.

rabbitmq-plugins enable rabbitmq_management

The module has plenty of checks that are helpful to get basic overview of state and consistency of your system. I`ll review some of them so you can get a good overview of how modules work. You can find good explanations for each of them on the Git page as well.

- check_rabbitmq_alivness - checks send/receive messages using the API
- check_rabbitmq_connections - gathers facts about connection via the management API
- check_rabbitmq_exchange - collects rates of confirmed published and published out messages in a period of time - really useful for performance data monitoring
- check_rabbitmq_overivew - collects the pending ready and unacknowledged messages. 
- check_rabbitmq_queue - for monitoring of queues
- check_rabbitmq_server - monitoring the resource usage of the server 

There are many more that you can have a look at Github page above. The module is developing really fast and after the writing of this blog post there can be many more.

So having these powerful checks you can setup your monitoring. Most of them are straight forward and you can just set them up for periodic checks.

Here are some hints for exchanges that can be checked with - check_rabbitmq_exchange:

+ EXCHANGE -  nova
+ EXCHGANE - openstack
+ EXCHANGE - neutron 
+ EXCHANGE - heat (if you use it regulary)

Here are some hints for monitoring queues of Openstack:

Volume status and availability
+ QUEUE - cinder-volume
+ QUEUE - cinder-backup
+ QUEUE - cinder-scheduler
Nova status and availability
+ QUEUE - compute
+ QUEUE - conductor
+ QUEUE - engine 
Neutron - network status and availability
+ QUEUE - dhcp_agent
+ QUEUE - l3_agent

Last but not least will be the monitoring of the RabbitMQ process and the standard monitors for system load, CPU and disk space of the controller node.

In the next part I`ll try to explain a bit more about image and KVM monitoring.