Host Failure hooks for High Availability

amindomao · December 26, 2017, 10:44am

Thanks for your reply )

I’ll enable fencing. I’ve just enabled it. I have Dell servers with iDRAC.
But what if I have node2 in different DC for example and connection between node2 and oned will be lost including fencing in separate vlan?
It is “not good” “architecture” or there could be some other “magic” to deal with duplicate VM?
Sorry for newbies questions.

heathen · December 26, 2017, 11:28am

I can’t imagine any magic here in case of ordinary software.

Ordinary OSes with ordinary filesystems are built to work with dedicated sets of data (dedicated drives). What would happened if you try to write to the same HDD from two computers? You data will be corrupted. If you recall the CAP theorem, ordinary software requires C - consistency. If you use special software built with such situation in mind than its a different story. But I assume you are talking about usual OSes.

So HA setup with uneven number of ONE controllers and fencing is the only option to hope (not assured) to prevent split-brain data corruption.

amindomao · December 26, 2017, 12:01pm

I don’t take in account unordinary software ), just centos, ceph and one. I thought, maybe there could be mechanism to fence zombie VM when it will appear.

amindomao · December 26, 2017, 12:09pm

Good to know by the way.

heathen · December 26, 2017, 12:54pm

The problem is that if a zombie VM appeared - it’s probably too late to do anything with it, cause it will connect ceph backend much before you find it.

AFAIK ONE doesn’t have any mechanism for termination unwilling zombie VMs. Of course, you can create a script to monitor connection to the ONE front-end and kill all VMs (and\or shutdown a host) and put it on each host, for example. Or there can possibly be other options as well. But every solution depends on the particular task and particular hardware\software architecture.

P.S. In my previous post I’ve meant uneven number of hosts, of course!

amindomao · December 27, 2017, 11:21am

Ok, understood. I’m thinking almost the same in general.
I’ve read about ceph exclusive-lock feature and thought it could help me to pretect an image with ONE host hooks.

heathen · December 27, 2017, 11:43am

ONE uses it, but it unlocks images on VM start.

Imagine your host failed and you use hooks to restart VMs automatically on another host. If ONE will not be able to unlock VM’s image it will not be able to start VM again.

amindomao · December 27, 2017, 5:10pm

I assume following scenario: host failed, ONE uses host hook and unlocks the image, VM migrates to another host. ONE locks the image to protect it from being corrupted by revived zombie )

miljan · September 20, 2018, 9:31pm

I thing this is because host/node is down/rebooted. So host/node is not reachable for ssh connection.

pianziva · December 27, 2018, 7:28am

hi,

do i have to setup corosync and pacemaker to make HA on host failures ?

Fyi at this time i only setup opennebula with 5 nodes and shared storage glusterfs.

Topic		Replies	Views
Migrate VM on host crash Product Support	5	2340	December 17, 2015
VM HA not working after update to 5.2 Product Support	11	1622	February 20, 2017
VMs fail to start after hypervisor failure Installation & Configuration	2	310	February 15, 2021
HOST failure HOOK execution failed Product Support	17	1816	October 24, 2019
Host Architecture for HA Product Support	3	1053	June 9, 2016

Host Failure hooks for High Availability

Related topics