Today a host in our ONe cluster crashed (a HW error, reportedly to be fixed by a pending BIOS upgrade), and I discovered that I don’t know exactly what is the proper way of handling the host crash. How can I tell OpenNebula something like “this host has crashed and rebooted, try to recover/reschedule everything that has been running on it”?
For crashed hosts it is not easy, because oned
cannot possibly know whether the host is really down or just overloaded and lagging. But for rebooted hosts, from the host uptime oned
should know that it is safe to assume that everything previously running on that host is definitely gone.
I tried to run “reschedule” on one VM, and “undeploy/deploy” on another, but both crashed on boot with I/O errors on /dev/sda
. The problem was that apparently Qemu locks the Ceph RBD image when it is in use, and after the crash/reboot the lock remains in place. So the VMs were getting I/O errors on writes to their disks. Is OpenNebula supposed to handle this and remove the lock?
FWIW, I unlocked the RBD images the following way:
# Make sure no new VMs get scheduled onto a rebooted host:
onehost disable $CRASHED_HOST
# Verify that no VMs are running on that host.
# Get a list of VMs on that host which are in the UNKNOWN state,
# or, if the host is already rebooted, in the POWEROFF state
# and check logs which ones were indeed running at the time of crash.
# Get a list of images locked by that host:
rbd ls one | while read image
do
rbd lock ls one/$image | grep -q $IP_OF_CRASHED_HOST:0 \
&& echo $image
done > /tmp/locked-images
# Remove the locks (verify that the output looks ok and then re-run
# with `echo` below removed):
while read image
do
id="`rbd lock ls one/$image --format json | jq -r .[0].id`"
locker="`rbd lock ls one/$image --format json | jq -r .[0].locker`"
echo rbd lock rm one/$image "$id" "$locker"
done < /tmp/locked-images
# Restart the crashed VMs. I did this without rescheduling them
# to another host, because I wanted to test whether the BIOS
# upgrade helped.
onevm resume 1234,5678,1235,1236,...
onehost enable $CRASHED_HOST
But in my opinion ONe should be able to do this itself. So, what is the correct way of handling a crashed host? Thanks,
-Yenya