After upgrade to 5 one vm stuck in FAILED state

Hello,

i just have this problem. One of the vm stucked in failed state after upgrade from 4.10 to 5.0.2. When executing onevm recover --delete it does nothing.

onevm list shows this

446 admin oneadmin Automation Serv fail 0 0K 48d 18h48

in sunstone i can see that LCM_STATE is LCM_INIT and normal state FAILED.

Is there a way how to delete this vm ? Thank you !

recover --delete should do the job, can you paste the relevant lines in
oned.log and VM log file?

Actually that i’ve already tried. It does nothing as well.

oned.conf

Req:8304 UID:2 VirtualMachineInfo invoked , 446
Req:8304 UID:2 VirtualMachineInfo result SUCCESS, “446<UID…”

It succeded but the vm is still there. Thank you !

And here is the machine log

rm: cannot remove ‘/one/datastores/0/446’: Directory not empty
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: rm: cannot remove ‘/one/datastores/0/446/.nfs0000000103ee016000000001’: Device or resource busy
Tue Oct 18 09:53:50 2016 [Z0][TM][E]: Error deleting /one/datastores/0/446
Tue Oct 18 09:53:50 2016 [Z0][TM][I]: ExitCode: 1
Tue Oct 18 09:53:50 2016 [Z0][TM][E]: Error executing image transfer script: Error deleting /one/datastores/0/446
Tue Oct 18 09:53:51 2016 [Z0][DiM][I]: New VM state is FAILED

It seems to be a problem related with your nfs mount point. Try to
unmount/mount it again, you can remove /one/datastores/0/446/ manually and
try again

I cleaned all those paths. Manually and try to recover vm again. Nothing happened. When run recover --delete it passes with success but vm is still there. When run rocever --recreate eg. it freezes and i have to restart opennebula process. The image connected to vm is marked used_pers so i am unable to clone it. I can copy that image manually on the fs level and create new image from it and run this machine with new template. However that one failed vm will be still there. Is there any possibility to remove that vm directly from database ? we’re running sqlite here.

Finally i solved that by manually editing one.db. I just edit the vm_pool table and change the state column from the value 7 to 6. Then called onedb fsck and start opennebula again. VM is gone and everything is well now !