Remove safely VM folder

Hello,

after checking VMs list in Sunstone and compare ID list in both servers (my cluster is formed by one server that act as controller+KVM and another server that acts only as KVM), I have seen that there are several VMs folders in “datastore-0” (in both nodes) that belongs to VMs that, now, don’t exist (I can’t see them in Sunstone). So… can I delete safely that folders from “datastores” (running a simple “rm $VM_ID”) or a “delete” command would break database consistency?

This problem is wasting a lot of space in both servers.

Thanks.

Check first if the VMs are in DONE state. Sunstone displays the VMs in non DONE state but to be sure you can query the VM information with onevm show .

I have rechecked two VMs that not appear in Sunstone but their VM folder exist and, both, appear as “DONE” and in “virtual machine history”, last (or unique) action is “terminate”. However, their folder still exists.
So, if that VMs appear as “DONE”, could I remove their folder in “datastore 0” with no problem and surely?

And, as a curiosity, why is it happening this? Could be a problem in my servers configuration and/or OpenNebula configuration?

Thanks.

Hello again,

my problem is that in /var/lib/one/datastore/0 exist some folders that belonged to VMs that, now, are deleted. I have checked “onevm show $VM_ID” and it appears as DONE, so I suppose I can remove that folder. However, this behavior is happening regularly, why?
These is the lines from “oned.log” from one VM that, after “Terminate”, folder continues created in /var/lib/one/datastore/0:

oned.log:Wed Sep 14 14:13:39 2022 [Z0][ReM][D]: Req:8496 UID:1049 one.vm.info result SUCCESS, "<VM><ID>9903</ID><UI..."
oned.log:Wed Sep 14 14:13:39 2022 [Z0][ReM][D]: Req:7392 UID:1049 IP:127.0.0.1 one.vm.action invoked , "terminate-hard", 9903
oned.log:Wed Sep 14 14:13:39 2022 [Z0][DiM][D]: Terminating VM 9903
oned.log:Wed Sep 14 14:13:39 2022 [Z0][DiM][E]: Could not terminate VM 9903, wrong state SHUTDOWN_UNDEPLOY.
oned.log:Wed Sep 14 14:13:39 2022 [Z0][ReM][E]: Req:7392 UID:1049 one.vm.action result FAILURE [one.vm.action] Error performing action "terminate-hard": Coul
d not terminate VM 9903, wrong state SHUTDOWN_UNDEPLOY.
oned.log:Wed Sep 14 14:13:44 2022 [Z0][IPM][D]: Message received: SHUTDOWN SUCCESS 9903 -
oned.log:Wed Sep 14 14:13:44 2022 [Z0][TrM][D]: Message received: TRANSFER SUCCESS 9903 -
oned.log:Wed Sep 14 14:13:47 2022 [Z0][ReM][D]: Req:112 UID:1049 IP:127.0.0.1 one.vm.info invoked , 9903, false
oned.log:Wed Sep 14 14:13:47 2022 [Z0][ReM][D]: Req:112 UID:1049 one.vm.info result SUCCESS, "<VM><ID>9903</ID><UI..."
oned.log:Wed Sep 14 14:13:47 2022 [Z0][ReM][D]: Req:176 UID:1049 IP:127.0.0.1 one.vm.action invoked , "terminate-hard", 9903
oned.log:Wed Sep 14 14:13:47 2022 [Z0][DiM][D]: Terminating VM 9903
oned.log:Wed Sep 14 14:13:47 2022 [Z0][ReM][D]: Req:176 UID:1049 one.vm.action result SUCCESS, 9903
oned.log:Wed Sep 14 14:13:47 2022 [Z0][ReM][D]: Req:7568 UID:1049 IP:127.0.0.1 one.vm.info invoked , 9903, false
oned.log:Wed Sep 14 14:13:47 2022 [Z0][ReM][D]: Req:7568 UID:1049 one.vm.info result SUCCESS, "<VM><ID>9903</ID><UI..."
oned.log:Wed Sep 14 14:13:47 2022 [Z0][ReM][D]: Req:688 UID:1049 IP:127.0.0.1 one.vm.monitoring invoked , 9903
oned.log:Wed Sep 14 14:13:47 2022 [Z0][ReM][D]: Req:4384 UID:1049 IP:127.0.0.1 one.vm.monitoring invoked , 9903
oned.log:Wed Sep 14 14:13:48 2022 [Z0][TrM][D]: Message received: TRANSFER SUCCESS 9903 -
oned.log:Wed Sep 14 14:13:48 2022 [Z0][ONE][E]: VM 9903 is not in host 15.
oned.log:Wed Sep 14 14:14:06 2022 [Z0][InM][D]: VM_STATE update from host: 15. VM id: 9903, state: POWEROFF

This VM was running in my second node (that acts only as KVM), so /var/lib/one/datastore/0 is not datastore “0” from the server. My OpenNebula cluster is composed by one server (that acts as server+KVM node) and another server (acts only as KVM). I don’t share datastore with CEPH, NFS or similar, so my “datastores” menu only shows datastores 0, 1 and 2 from main server, but nothing from second server. I think this is a right configuration way, because when VMs instantiate in my second server, datastore is local in that second server.

I don’t know if problem could be this configuration, but there are some VMs that, after “Terminate”, its folder is correctly removed from system (also, this strange behavior has also occurred with VMs created in main server…).

Thanks a lot!!!

Hi again, @dclavijo @pczerny @ahuertas @cgonzalez

In the last days, it is a normal behavior to find the VM folder in /var/lib/one/datastores/0/ but VM has been deleted… I have noticed that this problem is always happening in my second node (not server, only KVM). For example, these are the lines from a just deleted VM… but folder stil exists.

one/oned.log:Fri Sep 23 09:29:33 2022 [Z0][ReM][D]: Req:5664 UID:0 IP:127.0.0.1 one.vm.info invoked , 9859, false
one/oned.log:Fri Sep 23 09:29:33 2022 [Z0][ReM][D]: Req:5664 UID:0 one.vm.info result SUCCESS, "<VM><ID>9859</ID><UI..."
one/oned.log:Fri Sep 23 09:29:35 2022 [Z0][ReM][D]: Req:1488 UID:0 IP:127.0.0.1 one.vm.info invoked , 9859, false
one/oned.log:Fri Sep 23 09:29:35 2022 [Z0][ReM][D]: Req:1488 UID:0 one.vm.info result SUCCESS, "<VM><ID>9859</ID><UI..."
one/oned.log:Fri Sep 23 09:32:12 2022 [Z0][ReM][D]: Req:7776 UID:0 IP:127.0.0.1 one.vm.info invoked , 9859, false
one/oned.log:Fri Sep 23 09:32:12 2022 [Z0][ReM][D]: Req:7776 UID:0 one.vm.info result SUCCESS, "<VM><ID>9859</ID><UI..."
one/oned.log:Fri Sep 23 09:32:12 2022 [Z0][ReM][D]: Req:320 UID:0 IP:127.0.0.1 one.vm.action invoked , "terminate-hard", 9859
one/oned.log:Fri Sep 23 09:32:12 2022 [Z0][DiM][D]: Terminating VM 9859
one/oned.log:Fri Sep 23 09:32:12 2022 [Z0][ReM][D]: Req:320 UID:0 one.vm.action result SUCCESS, 9859
one/oned.log:Fri Sep 23 09:32:12 2022 [Z0][ReM][D]: Req:6688 UID:0 IP:127.0.0.1 one.vm.info invoked , 9859, false
one/oned.log:Fri Sep 23 09:32:12 2022 [Z0][ReM][D]: Req:6688 UID:0 one.vm.info result SUCCESS, "<VM><ID>9859</ID><UI..."
one/oned.log:Fri Sep 23 09:32:12 2022 [Z0][TrM][D]: Message received: TRANSFER SUCCESS 9859 -
one/oned.log:Fri Sep 23 09:32:12 2022 [Z0][ONE][E]: VM 9859 is not in host 15.

Host 15 is my second node.

Why is happening this?

Thanks.

Once the VM is terminated, do you get the host capacity restored, meaning, the RAM and CPU resources the VM was using when it was running on the host are now available again ?

Hello @Daniel_Ruiz_Molina,
first if the VM is in done state, you can safely remove the VM folder in /var/lib/one/datastore/0/

The though part here is why it’s happening. In the logs I can see you use terminate-hard command, I suppose it’s because previous terminate (or undeploy) action hasn’t finished and the VM is stuck in SHUTDOWN_UNDEPLOY state.

You send only part of the oned.log, which shows the terminate-hard. But I think the error happened earlier, when you first called undeploy or terminate, oned.log from that time might be more usefull.

Also /var/log/one/vm_id.log might show some usefull info.

Btw, the message in the log VM 9903 is not in host 15. is not relevant. The VM was removed from the host capacity by the first undeploy/terminate call, so it can’t be removed second time by terminate-hard

Yes, both RAM and CPU resources have been restored, released and “returnened” to the server.

Hello,

My OpenNebula Cluster is designed for academical purposes. Students develop some courses in a OpenNebula environment that allows create and remove VMs when they need it. During night, I run a script to undeploy all running VMs, because Cluster only shares 168 CPUs and 1.2 TB RAM, but there are more than 300 or 400 students (and, sometimes, system has supported more tahn 600 VMs running…) so this script for undeploy allow me restore and reassign all CPU and RAM shared resources. So if a student leaves a VM (or more than one) running during night, after evaluating its load, scripts undeployies VM.

For all VMs that have been terminated but their folder still exits, there is NO /var/log/one_VM_ID.log file… so is it normal??

Next lines are taken from “onevm show $VM_ID” from two “trouble” VMs. This is only the “VIRTUAL MACHINE HISTORY” section:

SEQ UID  REQ   HOST         ACTION       DS           START        TIME     PROLOG
  0 944  8896  myhost       undeploy      0  09/18 10:30:13   0d 00h09m   0h00m02s
  1 944  1040  myhost       undeploy      0  09/18 10:46:59   0d 00h01m   0h00m00s
  2 944  608   myhost       undeploy      0  09/18 12:43:45   0d 01h00m   0h00m00s
  3 944  3040  myhost       undeploy      0  09/18 13:45:11   0d 00h01m   0h00m01s
  4 944  688   myhost       undeploy      0  09/18 20:00:20   0d 00h02m   0h00m00s
  5 944  7504  myhost       undeploy      0  09/20 10:41:42   0d 00h43m   0h00m00s
  6 944  224   myhost       undeploy      0  09/22 11:13:30   0d 00h00m   0h00m00s
  7 944  2096  myhost       terminate     0  09/22 12:11:04   2d 23h14m   0h00m00s
SEQ UID  REQ   HOST         ACTION       DS           START        TIME     PROLOG
  0 1039 1920  myhost       undeploy-h    0  09/14 15:38:53   0d 00h52m   0h00m02s
  1 1039 7072  myhost       undeploy-h    0  09/14 16:31:46   0d 00h09m   0h00m00s
  2 1039 8272  myhost       terminate     0  09/21 15:04:35   5d 05h48m   0h00m00s

Could be a problem with permissions to execute “rm -rf VM_ID”? Or any other suggestion?

Thanks a lot!

P.S.: because of my OpenNebula Cluster is designed to academical purposes, when students every day create and remove (“terminate”) some VMs, everyday I find some folders from inexistents VMs…

Hello,

it seems that after modifying cloud view (yaml file) and user view (yaml file), where I had changed “VM.terminate: false” and “VM.terminate_hard: true”, now all deleted VMs also remove its own folder. Maybe, the problem was that all VMs, when user click on “Terminate”, internally was executed a “Terminate Hard”…

OK, perfect.