Monitoring ONe with Nagios?

Hello,

do you monitor the status of OpenNebula with a monitoring tool like Nagios? If so, which parts of OpenNebula state do you watch and with which tool?

I would like to be informed about general problems which require sysadmin attention (hosts Error state, images in Error state, scheduling problems due to resource exhaustion, etc.). I would like to have a single three-state value “ONe is OK/WARNING/CRITICAL”.

My language of choice is Perl, and I have discovered that the Net::OpenNebula CPAN module works for me, so I can write something myself. But I would like to look at existing solutions first.

Thanks,

-Yenya

Hi,

I’ve made checks a few years back and there’s also a few more that already existed:

My checks are primarly for Check_MK, I don’t use plain Nagios anymore, since very long.

The code should be very easy to read for you anyway.

The other, API-based check is equally important. It seems to have been renamed, so I have no current info about it.

I found a copy of the original repo here:

1 Like

One thing I didn’t answer (sorry) was the single tri-state thing.
I would advise against that considering you have literally 100s of components behind a ONE cluster.

It’s possible if you use one of the various business intelligence addons for Nagios, but even those require you to have all the single checks and then tie a large global green/yellow/red thing out of them.

If you make your own check to have such a summary status, make sure you have a strong separation between WARN (look at it, things might still be working though) and CRIT (things are dead, drop your drink NOW)

Hi @darkfader,

Thank you for the great check_mk scripts.
I just need some pointers on how to install them.

Any tips?

Hi,

case1:
you run check_mk already or can use it
-> look at the check_mk documentation about plugins.
generally install things from plugins or local to /usr/lib/check_mk_agent/plugins or /usr/lib/check_mk_agent/local on the systems.
(Some checks are for the frontend, some for the nodes)
the things in the checks folder need to go to local/share/check_mk/checks on your Check_MK server.

case2:
you don’t run check_mk and want to stick with nagios
You’ll need to modify the checks to have output that matches with the Nagios plugin standard. It’s not very hard once you did it a few times.

I would recommend case one, maybe half a day and you have most important parts of monitoring, or about 3 days if you know very little about Nagios/Check_MK yet.

Here’s some material I wrote for people who took my class on Check_MK (and OpenNebula) back in May:

http://confluence.wartungsfenster.de/display/Adminspace/OpenNebula+Monitoring+with+Check_MK

1 Like