I have been recovering my cloud from a hard power fail this morning. All the VM’s on the nodes were
in “unkn” state. I executed onevm boot on them and they have all come back up. Looking at a single VM the problem is the following:
1920 matyas users cce-sl6h runn 0 1.9G fcl005 31d 19h54
1931 matyas users sw1790-sl6h runn 0 1.9G fcl005 27d 19h50
2026 zvada users CLI_DynamicIP_S poff 0 0K fcl005 23d 19h53
2038 blin users CLI_DynamicIP_S poff 0 0K fcl005 21d 02h19
3246 tlevshin users gratiaweb poff 0 0K fcl005 9d 18h12
3255 oneadmin oneadmin deswn poff 0 0K fcl005 8d 23h02
3264 oneadmin oneadmin deswn poff 0 0K fcl005 8d 22h58
Onevm list shows that there are 7 vm’s on the host, 2 runn and 5 in poff.
But in fact all 7 vm’s are running.
[root@fcl005 ~]# virsh list
Id Name State
1 one-1920 running
2 one-1931 running
3 one-2026 running
4 one-2038 running
5 one-3246 running
6 one-3255 running
7 one-3264 running
and I can log into them all and they are all pingable.
the monitoring ruby script seems to be runing OK and all remote files appear to be in order.
what might cause this? this node has been in this state for a couple hours now, so I don’t think
it is a transient. how to reset it?
I am running OpenNebula 4.8.
Steve Timm