Executive Summary: OpenNebula environment grown over 12 years running OpenNebula v5.2, upgrading host OS breaks initial domain creation. Live migration works!
Versions of the related components and OS (frontend, hypervisors, VMs):
CentOS7 with qemu-kvm on 30 Intel-cpu (Supermicro) hosts. some hosts upgraded to Alma9, fully updated. VMs a mix of versions of FreeBSD, CentOS, Alma, Ubuntu, Windows. CentOS7 hosts have been operating properly for many years. Some Alma9 hosts are new, others are upgrades, all show the same behavior.
All datastores are NFSv4 with qcow2 images.
Steps to reproduce:
Instantiate a new VM, any OS, attempt to deploy to Alma9 node
Current results:
Domain creation fails with this in the log:
Wed Aug 6 13:04:41 2025 [Z0][VM][I]: New state is ACTIVE
Wed Aug 6 13:04:41 2025 [Z0][VM][I]: New LCM state is PROLOG
Wed Aug 6 13:04:43 2025 [Z0][VM][I]: New LCM state is BOOT
Wed Aug 6 13:04:43 2025 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/1756/deployment.0
Wed Aug 6 13:04:43 2025 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Wed Aug 6 13:04:43 2025 [Z0][VMM][I]: ExitCode: 0
Wed Aug 6 13:04:43 2025 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Wed Aug 6 13:04:43 2025 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/0/1756/deployment.0' '003-B01s02n2' 1756 003-B01s02n2
Wed Aug 6 13:04:43 2025 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/1756/deployment.0
Wed Aug 6 13:04:43 2025 [Z0][VMM][I]: error: Cannot access storage file '/var/lib/one//datastores/0/1756/disk.0' (as uid:9869, gid:9869): No such file or directory
Wed Aug 6 13:04:43 2025 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/0/1756/deployment.0
Wed Aug 6 13:04:43 2025 [Z0][VMM][I]: ExitCode: 255
Wed Aug 6 13:04:43 2025 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Wed Aug 6 13:04:43 2025 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/0/1756/deployment.0
Wed Aug 6 13:04:43 2025 [Z0][VM][I]: New LCM state is BOOT_FAILURE
The VM directory on datastore 0 is created with the proper symlink, but the actual disk image file is not created.
Identical VMs created on CentOS7 hosts can be live migrated to Alma9 hosts
Expected results:
Domain should be created. If multiple instances are created with the same template and boot image and deployed to a diversity of hosts, the CentOS7 hosts work, the Alma9 hosts do not.
Background:
This is a very old environment, initially set up in 2012 but not diligently maintained since 2017. I am attempting to rejuvenate it with a current OS and OpenNebula while it continues to host scores of active VMs.