Host crash - how to handle it correctly?

Today a host in our ONe cluster crashed (a HW error, reportedly to be fixed by a pending BIOS upgrade), and I discovered that I don’t know exactly what is the proper way of handling the host crash. How can I tell OpenNebula something like “this host has crashed and rebooted, try to recover/reschedule everything that has been running on it”?

For crashed hosts it is not easy, because oned cannot possibly know whether the host is really down or just overloaded and lagging. But for rebooted hosts, from the host uptime oned should know that it is safe to assume that everything previously running on that host is definitely gone.

I tried to run “reschedule” on one VM, and “undeploy/deploy” on another, but both crashed on boot with I/O errors on /dev/sda. The problem was that apparently Qemu locks the Ceph RBD image when it is in use, and after the crash/reboot the lock remains in place. So the VMs were getting I/O errors on writes to their disks. Is OpenNebula supposed to handle this and remove the lock?

FWIW, I unlocked the RBD images the following way:

# Make sure no new VMs get scheduled onto a rebooted host:
onehost disable $CRASHED_HOST
# Verify that no VMs are running on that host.
# Get a list of VMs on that host which are in the UNKNOWN state,
# or, if the host is already rebooted, in the POWEROFF state
# and check logs which ones were indeed running at the time of crash.
# Get a list of images locked by that host:
rbd ls one | while read image
do
    rbd lock ls one/$image | grep -q $IP_OF_CRASHED_HOST:0 \
        && echo $image
done > /tmp/locked-images
# Remove the locks (verify that the output looks ok and then re-run
# with `echo` below removed):
while read image
do
    id="`rbd lock ls one/$image --format json | jq -r .[0].id`"
    locker="`rbd lock ls one/$image --format json | jq -r .[0].locker`"
    echo rbd lock rm one/$image "$id" "$locker"
done < /tmp/locked-images
# Restart the crashed VMs. I did this without rescheduling them
# to another host, because I wanted to test whether the BIOS
# upgrade helped.
onevm resume 1234,5678,1235,1236,...
onehost enable $CRASHED_HOST

But in my opinion ONe should be able to do this itself. So, what is the correct way of handling a crashed host? Thanks,

-Yenya

Very interesting topic. Host crashes and consequences are quite tricky.

Whether the host was crashed or not you can write your own script to determine it. Then you can trigger this script when a host goes from RUNNING to ERROR (the monitoring should kick in) using Host state hooks.

Then the script you have to unlock RBD images can be piped after the last script given that host is deemed crash.

It’s sort of like the VM HA which is a hook that triggers on host error. Then how to handle it is defined on the script of such hook.