Unable to trace why some VMs finish as zombies

Hello,

We have, from time to time, VMs that finished as zombies event after a terminate-hard:

Tue Jul 22 18:48:00 2025 [Z0][ReM][D]: Req:2032 UID:4 IP:127.0.0.1 one.vm.info invoked , 1550427
Tue Jul 22 18:48:00 2025 [Z0][ReM][D]: Req:2032 UID:4 one.vm.info result SUCCESS, "<VM><ID>1550427</ID>..."
Tue Jul 22 18:48:00 2025 [Z0][ReM][D]: Req:7168 UID:4 IP:127.0.0.1 one.vm.action invoked , "terminate-hard", 1550427
Tue Jul 22 18:48:00 2025 [Z0][DiM][D]: Terminating VM 1550427
Tue Jul 22 18:48:00 2025 [Z0][ReM][D]: Req:7168 UID:4 one.vm.action result SUCCESS, 1550427
Tue Jul 22 18:48:00 2025 [Z0][ReM][D]: Req:5296 UID:4 IP:127.0.0.1 one.vm.info invoked , 1550427
Tue Jul 22 18:48:00 2025 [Z0][ReM][D]: Req:5296 UID:4 one.vm.info result SUCCESS, "<VM><ID>1550427</ID>..."
Tue Jul 22 18:48:00 2025 [Z0][ReM][D]: Req:3904 UID:4 IP:127.0.0.1 one.vm.info invoked , 1550427
Tue Jul 22 18:48:00 2025 [Z0][ReM][D]: Req:3904 UID:4 one.vm.info result SUCCESS, "<VM><ID>1550427</ID>..."
Tue Jul 22 18:48:01 2025 [Z0][TrM][D]: Message received: TRANSFER SUCCESS 1550427 -
[…]
Tue Jul 22 18:48:50 2025 [Z0][InM][D]: VM_STATE update from host: 17. VM id: 1550427, state: RUNNING
[…]
Wed Jul 23 08:41:50 2025 [Z0][InM][D]: VM_STATE update from host: 17. VM id: 1550427, state: RUNNING

From the OpenNebula VM logs, everything looks OK:

Tue Jul 22 18:48:00 2025 [Z0][VM][I]: New LCM state is EPILOG
Tue Jul 22 18:48:01 2025 [Z0][VM][I]: New state is DONE
Tue Jul 22 18:48:01 2025 [Z0][VM][I]: New LCM state is LCM_INIT

Nothing in libvirtd or qemu logs, but the VM is still running on the host:

oneadmin@nebula84:~$ virsh -c qemu+tls://localhost/system list | grep one-1550427
 11269   one-1550427   running

with a deleted VM system directory:

oneadmin@nebula84:~$ ls datastores/0/1550427/
ls: cannot access 'datastores/0/1550427/': No such file or directory

Do you have any idea of what to check to understand what happends?

Regards.

Maybe there is some VM state hook running some operations behind the scenes ? You could also inspect the VM libvirt xml for more clues and the qemu process on the process list. There is also the possibility of that host being controlled by another frontend.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.