OpenNebula Large Deploymets question

Hi all, I have just tried federation setup, and I found that it is not what I exactly needed for now in fact these are two separate clouds and syncronisation of user accounts and some other administative resources between two different clouds.

I have small concerns about the fact how much compute hosts can managed by one core server. I want to grow it horisontaly not vertically.

And I want to manage huge mout of nodes as one opennebula cloud. So I need some tool for parallelize monitoring and manage processes from the main daemon.

I need something like zabbix-proxy but for opennebula. Is there some recomendation about it?

I found some options but all of them is not fit as well:

  • OpenNebula Federation - the most suitable option for this case, but formally it will be serveral different clouds, each with self different resources. There is no way for display all VMs without switching interface. Or use same neworks and datastores in two instances.
  • OpenNebula Hybrid Driver - Have a lot of limitations.
  • Ganglia Monitoring Drivers - Depricated and replaced by udp-push system.

I’ll comment only on the monitoring part: for monitoring/alerting, prometheus is trivial to setup and stable in operation.
Is offers a feature of federating prometheus servers,
which can be used for example to aggregate metrics from the servers running in several data centers.
Each data center will have it’s own prometheus server, and the selected metrics is available to the server (or several ones, for HA alerting) that summarizes them, usually using grafana as the graphical web UI. All these components actually work.

There seem to be only a very basic opennebula exporter, not actively developed but since you aim at a large scale you could contribute one.
Someone needs to write also the official ansible modules for opennebula …

On the other hand the support for both the physical infrastructure and applications monitoring with prometheus is quite good

For an introduction to prometheus see
and for alertmanager

1 Like

Hi @marcindulak,
Thanks for your answer, I read about prometheus - this is quite interesting monitoring system, and I will definitely try it next time. But for now we already have zabbix, that works fine and have no problems with high scale.

My question more about opennebula core daemon, and running operations from it:
eg, collecting data from hosts about machines states and deploy new VMs, running scripts for transfer images, that’s all.

How to scale this operations?

How many nodes are you planning ? How many VM ?

Hi we have no accurate forecasts,
but it would be nice for have about thousand nodes per cluster and up to 10k vms