Last year we focused on OpenStack technology and the projects behind it. We decided to stress it and move our scope in that direction because of the rich features and flexibility that it provides. But as we know great power comes with great responsibility.
Once you are done with the installation and configuration of the cloud you need to guarantee its availability and this is where monitoring comes in. If you browse the web you might find numerous technologies that can fit the need – Zabbix, Nagios, and Icinga are just a few of them. So it’s really easy to choose one that will do the job.
My choice of monitoring is Icinga2/ Icinga and here are some points that helped in choosing it:
– Icinga2 support Graphite natively- RESTful API support- Rich web experience- Support of Nagios modules
Once we select the monitoring technology we can focus on the monitoring itself. OpenStack provides an Infrastructure-as-a-Service through a variety of projects or services:
– Nova- Keystone- Glance- Horizon- Neutron- Cinder- Swift… and many more.
Moreover, OpenStack services communicate through the AMQP technology which was selected by OpenStack. AMQP broker of the default installation is RabbitMQ but it can be substituted by Qpid if needed. All communications for most of the components like Nova, Cinder, and Neutron use Remote Procedure Calls to communicate and that makes message broker the heart of the cloud. Tasks like creating new volumes, starting new instances, stopping new instances, and more. So monitoring RabbitMQ is like checking your heart rate regularly.
Once you have installed the Icinga or Nagios monitoring you can focus on looking for modules. Here you can find the Nagios module we use to monitor.
Before you go with it you must enable the rabbit management console.
rabbitmq-plugins enable rabbitmq_management
The module has plenty of checks that are helpful to get basic overview of state and consistency of your system. I`ll review some of them so you can get a good overview of how modules work. You can find good explanations for each of them on the Git page as well.
- check_rabbitmq_alivness - checks send/receive messages using the API - check_rabbitmq_connections - gathers facts about connection via the management API - check_rabbitmq_exchange - collects rates of confirmed published and published out messages in a period of time - really useful for performance data monitoring - check_rabbitmq_overivew - collects the pending ready and unacknowledged messages. - check_rabbitmq_queue - for monitoring of queues - check_rabbitmq_server - monitoring the resource usage of the server
There are many more that you can have a look at Github page above. The module is developing really fast and after the writing of this blog post there can be many more.
So having these powerful checks you can setup your monitoring. Most of them are straight forward and you can just set them up for periodic checks.
Here are some hints for exchanges that can be checked with – check_rabbitmq_exchange:
+ EXCHANGE - nova + EXCHGANE - openstack + EXCHANGE - neutron + EXCHANGE - heat (if you use it regulary)
Here are some hints for monitoring queues of OpenStack:
Volume status and availability + QUEUE - cinder-volume + QUEUE - cinder-backup + QUEUE - cinder-scheduler Nova status and availability + QUEUE - compute + QUEUE - conductor + QUEUE - engine Neutron - network status and availability + QUEUE - dhcp_agent + QUEUE - l3_agent
Last but not least will be the monitoring of the RabbitMQ process and the standard monitors for system load, CPU, and disk space of the controller node.
Ready for part 2?