Home cluster: best practice to protect orchestrator availability?

florianoverkamp · May 17, 2021, 6:57am

Hi,

I’ve been tinkering with an OpenNebula cluster on top of Gluster shared storage in my home lab (read: on the cheap ;-)). The gluster nodes (SBC’s with HDD and SSD for cache) are in replica 3, and after some minor issues I think that setup is pretty robust. I have a VM with the orchestration setup (sunstone etc) on a plain vanilla KVM PC that controls one other PC as an OpenNebula node. Now I want to bring the first PC into the cluster, but obviously that also means the orchestrator is then running inside the cluster. This will open me up to all sorts of catch-22 scenario’s in case either that VM or the hosting hypervisor fails.

So I’m looking for best practices: What is the best/simplest way to ensure the orchestrator gets revived in case of failure? Here are some options I’ve considered:

Full HA with a backup admin-VM on the other PC, heartbeat and all that (feels like overkill, it’s ok if the node is gone for a few minutes)
Keeping the admin-VM outside of the cluster (inefficient)
Emergency script to start the image from the KVM-node directly (without using OpenNebula controls) (would work, but how?)
Bringing all the moving parts in to a docker swarm maybe? (I have a three node docker swarm spread out over both PCs, containers get revived pretty quickly)

Thanks for your input,
Florian

florianoverkamp · May 25, 2021, 2:57pm

Replying to self here

It occured to me maybe the management node (Sunstone etc) could be set up on an SBC, a Raspberry Pi or similar. Anyone here tried that before?

Florian

cgonzalez · May 26, 2021, 7:07am

Hi @florianoverkamp,

You can take a look to the Front-End High Availability set-up: OpenNebula Front-end HA — OpenNebula 6.0.2 documentation.

If you decide to add the raspberry, you could deploy an HA environment to take care of both VM failures and hypervisors failures, having 3 nodes (1 at the raspberry and 1 at each hypervisor node).
This way if one of the nodes (either VM or entire hypervisor node) fails you still have at least two of them up, which is enough for keeping the cluster running.

Topic		Replies	Views
HA Failover Scenario General	2	478	November 2, 2019
HA and frontend on hypervisors nodes Community Support	8	2777	March 18, 2019
OpenNebula Controller/ Frontend HA Community Support	1	644	February 28, 2015
OpenNebula best practices Community Support	5	1533	December 11, 2018
Recommended Production Architecture Community Support	0	802	May 12, 2015

Home cluster: best practice to protect orchestrator availability?

Related topics