'CLEANUP' after VM state is UNKNOWN?

Hi,

I’m seeing following behavior, which I believe is different after I upgraded ONE to 4.10 (from 4.8).
With version 4.8 whenever VM was in UNKNOWN state it was then ‘found again’ on next monitoring cycle and marked as RUNNING. (here is example of a VM which has been running before and after upgrade)
On version 4.8 (state changed UNKNOWN -> RUNNING, no action taken on the VM)

Sat Feb 7 08:12:10 2015 [Z0][LCM][I]: New VM state is UNKNOWN
Sat Feb 7 08:12:35 2015 [Z0][VMM][I]: VM found again, state is RUNNING
Sat Feb 7 09:08:35 2015 [Z0][LCM][I]: New VM state is UNKNOWN
Sat Feb 7 09:09:00 2015 [Z0][VMM][I]: VM found again, state is RUNNING
Sat Feb 7 10:05:05 2015 [Z0][LCM][I]: New VM state is UNKNOWN
Sat Feb 7 10:05:30 2015 [Z0][VMM][I]: VM found again, state is RUNNING
Sat Feb 7 11:01:30 2015 [Z0][LCM][I]: New VM state is UNKNOWN
Sat Feb 7 11:02:00 2015 [Z0][VMM][I]: VM found again, state is RUNNING
Sat Feb 7 11:58:05 2015 [Z0][LCM][I]: New VM state is UNKNOWN
Sat Feb 7 11:58:30 2015 [Z0][VMM][I]: VM found again, state is RUNNING

On version 4.10, when VM goes UNKNOW then RUNNING again, CLEANUP process kicks in 90 seconds after ‘VM found again, state is RUNNING’. I checked many VMs an see the same behavior - UNKNOWN->RUNNING->CLEANUP.

Thu Mar 5 22:04:13 2015 [Z0][LCM][I]: New VM state is RUNNING
Thu Mar 5 22:58:07 2015 [Z0][LCM][I]: New VM state is UNKNOWN
Thu Mar 5 22:58:38 2015 [Z0][VMM][I]: VM found again, state is RUNNING
Thu Mar 5 23:00:08 2015 [Z0][LCM][I]: New VM state is CLEANUP.
Thu Mar 5 23:00:08 2015 [Z0][VMM][I]: Driver command for 54290 cancelled
Thu Mar 5 23:00:17 2015 [Z0][VMM][I]: error: failed to get domain 'one-54290’
Thu Mar 5 23:00:17 2015 [Z0][VMM][I]: error: Domain not found: no domain with matching name 'one-54290’
Thu Mar 5 23:00:17 2015 [Z0][VMM][I]: ExitCode: 0
Thu Mar 5 23:00:17 2015 [Z0][VMM][I]: Successfully execute virtualization driver operation: cancel.

I checked oned.conf for any VM_HOOK, there is non for UNKNOWN. Just to be sure I removed VM_HOOK on FAILED, but it didn’t help. Is there a default VM_HOOK for UNKNOWN? It almost looks like ONE runs ‘delete --recreate’ on UNKNOWN state. Should I define VM_HOOK on UNKNOWN which does nothing? Please help.

Thank you.

Hi

No there is no default hook for UNKNOWN.

It seems that the clean up is triggered by a resubmit action on the VM by
an external program, probably a hook. It should be more info in oned.log
about triggered hooks…

Cheers

Thank you.

I’ve made few changes in oned.conf file:

  • switched HOST/VM monitoring from UDP_PUSH to TCP_PULL:
    For whatever reason UDP_PUSH gives to many monitoring errors, I believe this is what triggers UNKNOWN fro VMs in 1st place. I don’t know if TCP_PULL will be better under the same workload and VMs number…

  • changed argument for HOST_HOOK definition from “$ID -r -f -p 2” to default “$ID -r”

Will keep monitoring VM.logs and oned.log file…

Thank you.