Monitoring falsely reports vm in "poff" state

timm · June 1, 2015, 4:38pm

I have been recovering my cloud from a hard power fail this morning. All the VM’s on the nodes were
in “unkn” state. I executed onevm boot on them and they have all come back up. Looking at a single VM the problem is the following:

1920 matyas users cce-sl6h runn 0 1.9G fcl005 31d 19h54
1931 matyas users sw1790-sl6h runn 0 1.9G fcl005 27d 19h50
2026 zvada users CLI_DynamicIP_S poff 0 0K fcl005 23d 19h53
2038 blin users CLI_DynamicIP_S poff 0 0K fcl005 21d 02h19
3246 tlevshin users gratiaweb poff 0 0K fcl005 9d 18h12
3255 oneadmin oneadmin deswn poff 0 0K fcl005 8d 23h02
3264 oneadmin oneadmin deswn poff 0 0K fcl005 8d 22h58

Onevm list shows that there are 7 vm’s on the host, 2 runn and 5 in poff.
But in fact all 7 vm’s are running.

[root@fcl005 ~]# virsh list
Id Name State

1 one-1920 running
2 one-1931 running
3 one-2026 running
4 one-2038 running
5 one-3246 running
6 one-3255 running
7 one-3264 running

and I can log into them all and they are all pingable.

the monitoring ruby script seems to be runing OK and all remote files appear to be in order.

what might cause this? this node has been in this state for a couple hours now, so I don’t think
it is a transient. how to reset it?

I am running OpenNebula 4.8.

Steve Timm

timm · June 1, 2015, 4:40pm

I see this is related to bug 3212 which is supposedly fixed in opennebula 4.10.2
I do not have the human resources available to upgrade to latest at the moment
Any advice on how to reset an operating Opennebula 4.8.0 to correctly reflect the
state of the VM’s? We need to get this fixed.

timm · June 1, 2015, 4:42pm

PS–this is most likely to happen when you do onevm boot on all the VM’s on a large node at once, or
close to at once. There is a transient state where libvirt reports the VM in poweroff state
just before it starts, and this can last longer than expected if you are doing a bunch of them at once.

cmartin · June 2, 2015, 9:04am

Hi,

If you are in 4.8, I think the safest thing to do is shutdown the guests, and then try with the onevm resume command.

PS we’ve been working hard on master to improve these manual recovery situations. 4.14 will be able to handle your problem with a ‘recover --success’ action from sunstone.

Topic		Replies	Views
What are the causes let OpenNebula "thinks" VM is "POWEROFF" and how to recover? Product Support	5	738	February 18, 2015
VM goes from RUNNING state to POWEROFF state even though it is running on cluster/hypervisor Product Support	4	2552	February 19, 2022
VM in wrong state. Should I update DB manaully to recover? How VM and its state is being monitor? Product Support	4	1837	March 7, 2018
Shrödinger's virtual machine: simultaneously in RUNNING and POWEROFF states Product Support	0	598	May 17, 2019
VM running but monitor state is POWEROFF Product Support	6	3192	April 25, 2025

Monitoring falsely reports vm in "poff" state

[root@fcl005 ~]# virsh list Id Name State

Related topics

[root@fcl005 ~]# virsh list
Id Name State