Versions of the related components and OS (frontend, hypervisors, VMs):
Ubuntu 18.04, ONE 5.12, single host + frontend + local storage with qcow2 images
Steps to reproduce:
I have a single ONE 5.12 server, with local RAID arrays in use as datastores, which works fine.
I tried to migrate an image to a second server, with identical hardware and storage, I’ve also cloned the disk + template to the 2nd server. For some odd reason, on VM deployment, ONE will create an “imageds” folder in /var/lib/one/datastores, and doesnt use the proper datastore location to deploy it to.
So on the second server, I end up with the deployment files and snapshots in the wrong location, so it looks like this:
oneadmin@one01:~/datastores$ ls -l /var/lib/one/datastores/
total 16
drwxrwxr-x 2 oneadmin oneadmin 4096 May 9 09:32 0
drwxr-xr-x 2 oneadmin oneadmin 4096 Jan 3 13:51 1
lrwxrwxrwx 1 oneadmin oneadmin 16 Apr 25 13:03 105 -> /mnt/md5/imageds
lrwxrwxrwx 1 oneadmin oneadmin 17 Apr 25 13:04 106 -> /mnt/md5/systemds
lrwxrwxrwx 1 oneadmin oneadmin 16 Apr 25 13:06 107 -> /mnt/md2/imageds
lrwxrwxrwx 1 oneadmin oneadmin 17 Apr 25 13:06 108 -> /mnt/md2/systemds
drwxr-xr-x 2 oneadmin oneadmin 4096 Jan 3 13:51 2
drwxrwxr-x 3 oneadmin oneadmin 4096 May 9 11:44 imageds
oneadmin@one01:~/datastores$ tree imageds
imageds
└── 0de962b57e0e913907d9223919534188.snap
├── 0 -> /var/lib/one/datastores/imageds/0de962b57e0e913907d9223919534188
└── 0de962b57e0e913907d9223919534188.snap -> /var/lib/one/datastores/imageds/0de962b57e0e913907d9223919534188.snap
While on the first server, a deployment looks like this:
├── deployment.9
├── disk.0 -> /var/lib/one/datastores/102/76fd6a246d7f777efa2b0285d65b389e.snap/0
├── disk.0.snap -> /var/lib/one/datastores/102/76fd6a246d7f777efa2b0285d65b389e.snap
├── disk.1
├── disk.2 -> /var/lib/one/datastores/100/0de962b57e0e913907d9223919534188.snap/0
└── disk.2.snap -> /var/lib/one/datastores/100/0de962b57e0e913907d9223919534188.snap
OpenNebula deployment expects the files in the correct datastores (ds 106), in the vm nr. 5 subfolder:
Tue May 9 11:44:36 2023 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/106/5/deployment.0' 'one01' 5 one01
Tue May 9 11:44:36 2023 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/106/5/deployment.0
Tue May 9 11:44:36 2023 [Z0][VMM][I]: error: Cannot access storage file '/var/lib/one//datastores/106/5/disk.0' (as uid:1000, gid:1000): No such file or directory
But it keeps deploying it to a newly created “imageds” folder in /var/lib/one/datastores
Does anyone have an idea what could be configured wrong here?
Current results:
Datastores on server 1 (working OK):
ID USER GROUP NAME SIZE AVA CLUSTERS IMAGES TYPE DS TM STAT
103 oneadmin oneadmin SSD_SYSTEM 1.8T 44% 0 0 sys - qcow2 on
102 oneadmin oneadmin SSD_IMAGE 1.8T 44% 0 3 img fs qcow2 on
101 oneadmin oneadmin SAS_SYSTEM 7.2T 38% 0 0 sys - qcow2 on
100 oneadmin oneadmin SAS_IMAGE 7.2T 38% 0 9 img fs qcow2 on
2 oneadmin oneadmin files 915.3G 69% 0 0 fil fs ssh on
1 oneadmin oneadmin default 915.3G 69% 0 6 img fs ssh on
0 oneadmin oneadmin system - - 0 0 sys - ssh on
Datastores on server 2 (deployment fails):
ID USER GROUP NAME SIZE AVA CLUSTERS IMAGES TYPE DS TM STAT
108 oneadmin oneadmin SAS_SYSTEM 7.2T 95% 0 0 sys - qcow2 on
107 oneadmin oneadmin SAS_IMAGE 7.2T 95% 0 0 img fs qcow2 on
106 oneadmin oneadmin SSD_SYSTEM 1.8T 39% 0 0 sys - qcow2 on
105 oneadmin oneadmin SSD_IMAGE 1.8T 39% 0 1 img fs qcow2 on
2 oneadmin oneadmin files 915.3G 94% 0 0 fil fs ssh on
1 oneadmin oneadmin default 915.3G 94% 0 0 img fs ssh on
0 oneadmin oneadmin system - - 0 0 sys - ssh on
This is the failed deployment of a VM on the second server:
Tue May 9 11:44:33 2023 [Z0][VM][I]: New state is ACTIVE
Tue May 9 11:44:33 2023 [Z0][VM][I]: New LCM state is PROLOG
Tue May 9 11:44:34 2023 [Z0][VM][I]: New LCM state is BOOT
Tue May 9 11:44:34 2023 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/5/deployment.0
Tue May 9 11:44:35 2023 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Tue May 9 11:44:36 2023 [Z0][VMM][I]: ExitCode: 0
Tue May 9 11:44:36 2023 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Tue May 9 11:44:36 2023 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/106/5/deployment.0' 'one01' 5 one01
Tue May 9 11:44:36 2023 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/106/5/deployment.0
Tue May 9 11:44:36 2023 [Z0][VMM][I]: error: Cannot access storage file '/var/lib/one//datastores/106/5/disk.0' (as uid:1000, gid:1000): No such file or directory
Tue May 9 11:44:36 2023 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/106/5/deployment.0
Tue May 9 11:44:36 2023 [Z0][VMM][I]: ExitCode: 255
Tue May 9 11:44:36 2023 [Z0][VMM][I]: ExitCode: 0
Tue May 9 11:44:36 2023 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Tue May 9 11:44:36 2023 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Tue May 9 11:44:36 2023 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/106/5/deployment.0
Tue May 9 11:44:36 2023 [Z0][VM][I]: New LCM state is BOOT_FAILURE
Expected results:
The KVM deployment should create symlinks to the proper datastore, not create a new folder in /var/lib/one/datastores, I cant seem to figure out what is different on the 2nd server, resulting in boot failure…
Any help is appreciated, if more info is needed, let me know!
Creation of a correct deployment file is OK, it is readable and accessible, but the snapshots are created in " /var/lib/one/datastores/imageds " and not in the expected location: " /var/lib/one//datastores/106/5/disk.0 "