After upgrade to 5 one vm stuck in FAILED state

marigan · October 18, 2016, 1:14pm

Hello,

i just have this problem. One of the vm stucked in failed state after upgrade from 4.10 to 5.0.2. When executing onevm recover --delete it does nothing.

onevm list shows this

446 admin oneadmin Automation Serv fail 0 0K 48d 18h48

in sunstone i can see that LCM_STATE is LCM_INIT and normal state FAILED.

Is there a way how to delete this vm ? Thank you !

ruben · October 18, 2016, 1:30pm

recover --delete should do the job, can you paste the relevant lines in
oned.log and VM log file?

marigan · October 18, 2016, 1:34pm

Actually that i’ve already tried. It does nothing as well.

oned.conf

Req:8304 UID:2 VirtualMachineInfo invoked , 446
Req:8304 UID:2 VirtualMachineInfo result SUCCESS, “446<UID…”

It succeded but the vm is still there. Thank you !

marigan · October 18, 2016, 2:05pm

And here is the machine log

rm: cannot remove ‘/one/datastores/0/446’: Directory not empty
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][E]: Error deleting /one/datastores/0/446
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: ExitCode: 1
Tue Oct 18 09:53:50 2016 [Z0][TM][E]: Error executing image transfer script: Error deleting /one/datastores/0/446
Tue Oct 18 09:53:51 2016 [Z0][DiM][I]: New VM state is FAILED

ruben · October 19, 2016, 2:18pm

It seems to be a problem related with your nfs mount point. Try to
unmount/mount it again, you can remove /one/datastores/0/446/ manually and
try again

marigan · October 30, 2016, 9:58am

I cleaned all those paths. Manually and try to recover vm again. Nothing happened. When run recover --delete it passes with success but vm is still there. When run rocever --recreate eg. it freezes and i have to restart opennebula process. The image connected to vm is marked used_pers so i am unable to clone it. I can copy that image manually on the fs level and create new image from it and run this machine with new template. However that one failed vm will be still there. Is there any possibility to remove that vm directly from database ? we’re running sqlite here.

marigan · October 30, 2016, 10:38am

Finally i solved that by manually editing one.db. I just edit the vm_pool table and change the state column from the value 7 to 6. Then called onedb fsck and start opennebula again. VM is gone and everything is well now !

Topic		Replies	Views
How to recover from FAILED state? Product Support	3	5859	October 24, 2015
ONE 4.11.80: DELETE+RECREATE fails to restart the VM Product Support	4	1065	March 10, 2015
What to do with a ghost VM? Product Support	5	275	June 25, 2025
V5.0.0-1: [VirtualMachineInfo] Error getting virtual machine Product Support	7	2777	September 4, 2016
[one.vm.info] Error getting virtual machine 725 Product Support	12	1745	July 8, 2021

After upgrade to 5 one vm stuck in FAILED state

Related topics