I had this problem this morning where some VMs state changed to POWEROFF and I don’t understand why. I was not interacting with OpenNebula (I was sleeping) and the VMs state changed to SAVE_MIGRATE, PROLOG_MIGRATE, BOOT_MIGRATE. The command “/var/tmp/one/vnm/802.1Q/clean” failed and finally, the VM entered POWEROFF state.
My problem comes from the Polkitd error below on the host. It is also related to this post. I will try the Vlastimil_Holer solution.
dec 19 01:43:29 x10 kernel: traps: polkitd[22331] general protection ip:7f8439aa27f2 sp:7fff04b9cb10 error:0 in libmozjs-17.0.so[7f8439966000+3af000]
dec 19 01:43:29 x10 libvirtd[1747]: error from service: CheckAuthorization: Message did not receive a reply (timeout by message bus)
dec 19 01:43:29 x10 libvirtd[1747]: End of file while reading data: Erreur d’entrée/sortie
I had similar problem when prepare migration script for migrate VMs between clusters. I made several mistakes there, so my VMs where counted on multiple hosts.
So if you have, this two types of alternating repeating messages in the vm log:
[Z0][LCM][I]: VM found again by the drivers
[Z0][VM][I]: New LCM state is RUNNING
and
[Z0][LCM][I]: VM running but monitor state is POWEROFF
[Z0][VM][I]: New LCM state is SHUTDOWN_POWEROFF
[Z0][VM][I]: New state is POWEROFF
[Z0][VM][I]: New LCM state is LCM_INIT
And long placement history with a lot of monitor operations, you have exactly same situation.
That is how did I solved it.
find duplicated vms:
onehost list -x | xmlstarlet sel -t -v '/HOST_POOL/HOST/VMS/ID' -n | sort | uniq -c | sort | while read dup vm; do
if [ "$dup" != "1" ]; then
echo "vm $vm duplicated $dup times, on hosts:"
onehost list -x | xmlstarlet sel -t -v "/HOST_POOL/HOST[VMS/ID/.=${vm}]/ID" -n
fi
done