Host state ‘ERROR’ but still working?

Sometimes when migrating “wilds” vm the run_probes script fails and onebula thinks the host is in error state. This becomes a problem when I migrate many VMs and the run_probe script fails 3 times in a row, then the fence is triggered rebooting my host…


Versions of the related components and OS (frontend, hypervisors, VMs):
opennebula 5.4.13
centos 7

Steps to reproduce:
migrate VMs, while migration is happening (I don’t know exactly the right moment), run the run_probe script many times and you will sometime get the following error:

[root@ord-virt-004 ~]# /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 20 2 ord-virt-004
../../vmm/kvm/poll:403:in `xml_to_one': undefined method `text' for nil:NilClass (NoMethodError)
	from ../../vmm/kvm/poll:152:in `block in get_all_vm_info'
	from ../../vmm/kvm/poll:134:in `each'
	from ../../vmm/kvm/poll:134:in `get_all_vm_info'
	from /var/tmp/one/vmm/lib/poll_common.rb:99:in `print_all_vm_template'
	from ../../vmm/kvm/poll:531:in `'
ERROR MESSAGE --8<------
Error executing poll.sh
ERROR MESSAGE ------>8--
ERROR MESSAGE --8<------
Error executing collectd-client_control.sh
ERROR MESSAGE ------>8--
ARCH=x86_64
MODELNAME="Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz"
HYPERVISOR=kvm
TOTALCPU=9600
CPUSPEED=2700
TOTALMEMORY=791014980
USEDMEMORY=78110216
FREEMEMORY=712904764
FREECPU=9312
USEDCPU=288
NETRX=221262088274
NETTX=120226307714
KVM_MACHINES="pc-i440fx-rhel7.5.0 pc pc-i440fx-rhel7.0.0 rhel6.3.0 rhel6.4.0 rhel6.0.0 pc-i440fx-rhel7.1.0 pc-i440fx-rhel7.2.0 pc-q35-rhel7.3.0 rhel6.5.0 pc-q35-rhel7.4.0 rhel6.6.0 rhel6.1.0 rhel6.2.0 pc-i440fx-rhel7.3.0 pc-i440fx-rhel7.4.0 pc-q35-rhel7.5.0 q35"
KVM_CPU_MODELS="486 pentium pentium2 pentium3 pentiumpro coreduo n270 core2duo qemu32 kvm32 cpu64-rhel5 cpu64-rhel6 kvm64 qemu64 Conroe Penryn Nehalem Nehalem-IBRS Westmere Westmere-IBRS SandyBridge SandyBridge-IBRS IvyBridge IvyBridge-IBRS Haswell-noTSX Haswell-noTSX-IBRS Haswell Haswell-IBRS Broadwell-noTSX Broadwell-noTSX-IBRS Broadwell Broadwell-IBRS Skylake-Client Skylake-Client-IBRS Skylake-Server Skylake-Server-IBRS athlon phenom Opteron_G1 Opteron_G2 Opteron_G3 Opteron_G4 Opteron_G5 EPYC EPYC-IBPB"
DS_LOCATION_USED_MB=3542
DS_LOCATION_TOTAL_MB=9952
DS_LOCATION_FREE_MB=5883
DS = [
  ID = 100,
  USED_MB = 3542,
  TOTAL_MB = 9952,
  FREE_MB = 5883
]
HOSTNAME=ord-virt-004

Current results:
after 3 failed attempt, the fencing mechanism is triggered and host is rebooted

Expected results:
do not detect host as in error