[SOLVED] Host err - unable to recover it


I’ve got a host that entered an ‘err’ state and an disable/enable doesn’t bring it back to OK.

There’s an error-executing-probes I can’t get rid of and don’t want to reboot this host because there are VMs running there and with this ‘error state’ I don’t know what would happen.

This error-executing-probes is strange because information about capacities is recovered. I don’t know how to discover what’s failing.

How can I debug this? I’m using opennebula 4.6.2 under Centos 6.5 and kvm VMs



In order not to leave this unanswered, here is the resolution :
I had to delete /var/tmp/one on affected host and then :

su - oneadmin
onehost sync <host_id> --force

I also deleted /etc/cron.daily/tmpwatch