VM in wrong state. Should I update DB manaully to recover? How VM and its state is being monitor?

Hi,

Is there any document I can read about how VM state is being monitor by Opennebula (e.g. what command it use? )
My issue is that I have to restart libvirtd on one of my node (service libvirt-bin restart). Ever since I did that, all the VMs on that node have “POWEROFF” state in Opennebula dashboard even though those VM are still running.

I’ve looked at oned.log. It appear that those VM are not being monitor anymore ( I don’t see any monitor related message on those VM anymore. Is there a flag in database somewhere to tell it to monitor? Any information would be appreciated.

The last sucessfully monitored message is just before I restart libvirtd on the node that hosting these VMs.

Tue Mar 31 15:23:02 2015 [Z0][VMM][D]: VM 76 successfully monitored: STATE=a USEDCPU=0.5 USEDMEMORY=4194304 NETRX=24885223744 NETTX=2906850074

This is similar situation to http://dev.opennebula.org/issues/776. Should I update DB as describe in the issue#776.
BTW, I’m on Opennebula 4.10.0 which the issue should be fixed already though.

virsh list, here:

https://github.com/OpenNebula/one/blob/master/src/vmm_mad/remotes/poll_xen_kvm.rb#L84

and then translated here:

https://github.com/OpenNebula/one/blob/master/src/vmm_mad/remotes/poll_xen_kvm.rb#L84

Note, however that if the VM is not found in virsh list is considered to be
powered-off.

Also, there is no “pre-selection” of which VMs are monitored, all of the
VMs running are sent back to oned.

Thanks ruben. Runnig “virsh list” show all VM on running state. I tried to restart oned but it still say those VMs in POWEROFF state. Is poll_xen_kvm.rb executed by collected-client.rb on the host? What is the proper way to restart “collectd-client.rb” ? Thanks in advance.

root@node-002:/home/soonthor# ps -ef |grep ruby
oneadmin  2392     1  0 Mar24 ?        00:01:23 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 6 node-002.dc1.xxx.com

same problem here (oned version 4.8.0):

% virsh list | grep 1028
49 one-1028 running

% onevm show 1028 | grep STATE
STATE : POWEROFF
LCM_STATE : LCM_INIT

doing a resume just ended up in same state but resolved it by doing a recover immediately after initiating a resume:

% onevm resume 1028 ; onevm recover --success 1028

% onevm show 1028 | grep STATE
STATE : ACTIVE
LCM_STATE : RUNNING