Problem with migrating vms after host error hook launched

jfontan · August 29, 2017, 2:37pm

It seems that there is a bug that causes the VM to change to running state when disk monitoring arrives. I’ve opened an issue:

https://dev.opennebula.org/issues/5331

EXECUTE SUCCESS 1 error: means that the hook was successfully executed and wrote no error message
The VM changed from UNKNOWN to RUNNING before the scheduler picked it for rescheduling. That made it think it’s really running and tried to do a live migration. On UNKNOWN state the VM is not migrated, it’s redeployed. This was caused by the bug you’ve found.
We aim to fix the bug in version 5.4.1. Meanwhile, as a workaround, you can disable disk monitoring. To do this you can comment the following lines from /var/lib/one/remotes/tm/qcow2/monitor and issue the command onehost sync --force in the frontend:

OpenNebula/one/blob/release-5.4.0/src/tm_mad/shared/monitor#L78-L106


for vm in \$vms; do
vmdir="${BASE_PATH}/\${vm}"
disks=\$(ls "\$vmdir" | grep '^disk\.[0-9]\+$')


[ -z \$disks ] && continue


echo -n "VM=[ID=\$vm,POLL=\""


for disk in \$disks; do
    disk_id="\$(echo "\$disk" | cut -d. -f2)"
    disk_size="\$(du -mL "\${vmdir}/\${disk}" | awk '{print \$1}')"
    snap_dir="\${vmdir}/\${disk}.snap"


    [ -z "\$disk_size" ] && continue
    echo -n "DISK_SIZE=[ID=\${disk_id},SIZE=\${disk_size}] "


    if [ -e "\$snap_dir" ]; then
        snaps="\$(ls "\$snap_dir" | grep '^[0-9]$')"


        for snap in \$snaps; do

This file has been truncated. show original

Topic		Replies	Views
HOST failure HOOK execution failed Community Support	17	1813	October 24, 2019
Host in ERROR after hard reboot Community Support	1	626	August 22, 2019
Opennebula 5.0 Beta - Host Hook Community Support	10	1529	June 23, 2016
Host state ‘ERROR’ but still working? Community Support	0	431	August 22, 2018
Host error hook not working (5.4.13) Community Support	6	1589	July 12, 2018

Problem with migrating vms after host error hook launched

Related topics