One 5.12 - Help! kvm deploys in wrong datastore?

VURoland · May 9, 2023, 12:02pm

Versions of the related components and OS (frontend, hypervisors, VMs):
Ubuntu 18.04, ONE 5.12, single host + frontend + local storage with qcow2 images

Steps to reproduce:
I have a single ONE 5.12 server, with local RAID arrays in use as datastores, which works fine.
I tried to migrate an image to a second server, with identical hardware and storage, I’ve also cloned the disk + template to the 2nd server. For some odd reason, on VM deployment, ONE will create an “imageds” folder in /var/lib/one/datastores, and doesnt use the proper datastore location to deploy it to.

So on the second server, I end up with the deployment files and snapshots in the wrong location, so it looks like this:

oneadmin@one01:~/datastores$ ls -l /var/lib/one/datastores/
total 16
drwxrwxr-x 2 oneadmin oneadmin 4096 May  9 09:32 0
drwxr-xr-x 2 oneadmin oneadmin 4096 Jan  3 13:51 1
lrwxrwxrwx 1 oneadmin oneadmin   16 Apr 25 13:03 105 -> /mnt/md5/imageds
lrwxrwxrwx 1 oneadmin oneadmin   17 Apr 25 13:04 106 -> /mnt/md5/systemds
lrwxrwxrwx 1 oneadmin oneadmin   16 Apr 25 13:06 107 -> /mnt/md2/imageds
lrwxrwxrwx 1 oneadmin oneadmin   17 Apr 25 13:06 108 -> /mnt/md2/systemds
drwxr-xr-x 2 oneadmin oneadmin 4096 Jan  3 13:51 2
drwxrwxr-x 3 oneadmin oneadmin 4096 May  9 11:44 imageds
oneadmin@one01:~/datastores$ tree imageds
imageds
└── 0de962b57e0e913907d9223919534188.snap
    ├── 0 -> /var/lib/one/datastores/imageds/0de962b57e0e913907d9223919534188
    └── 0de962b57e0e913907d9223919534188.snap -> /var/lib/one/datastores/imageds/0de962b57e0e913907d9223919534188.snap

While on the first server, a deployment looks like this:

├── deployment.9
├── disk.0 -> /var/lib/one/datastores/102/76fd6a246d7f777efa2b0285d65b389e.snap/0
├── disk.0.snap -> /var/lib/one/datastores/102/76fd6a246d7f777efa2b0285d65b389e.snap
├── disk.1
├── disk.2 -> /var/lib/one/datastores/100/0de962b57e0e913907d9223919534188.snap/0
└── disk.2.snap -> /var/lib/one/datastores/100/0de962b57e0e913907d9223919534188.snap

OpenNebula deployment expects the files in the correct datastores (ds 106), in the vm nr. 5 subfolder:

Tue May  9 11:44:36 2023 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/106/5/deployment.0' 'one01' 5 one01
Tue May  9 11:44:36 2023 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/106/5/deployment.0
Tue May  9 11:44:36 2023 [Z0][VMM][I]: error: Cannot access storage file '/var/lib/one//datastores/106/5/disk.0' (as uid:1000, gid:1000): No such file or directory

But it keeps deploying it to a newly created “imageds” folder in /var/lib/one/datastores
Does anyone have an idea what could be configured wrong here?

Current results:
Datastores on server 1 (working OK):

  ID USER     GROUP    NAME                                                                                             SIZE AVA CLUSTERS IMAGES TYPE DS      TM      STAT
 103 oneadmin oneadmin SSD_SYSTEM                                                                                       1.8T 44% 0             0 sys  -       qcow2   on  
 102 oneadmin oneadmin SSD_IMAGE                                                                                        1.8T 44% 0             3 img  fs      qcow2   on  
 101 oneadmin oneadmin SAS_SYSTEM                                                                                       7.2T 38% 0             0 sys  -       qcow2   on  
 100 oneadmin oneadmin SAS_IMAGE                                                                                        7.2T 38% 0             9 img  fs      qcow2   on  
   2 oneadmin oneadmin files                                                                                          915.3G 69% 0             0 fil  fs      ssh     on  
   1 oneadmin oneadmin default                                                                                        915.3G 69% 0             6 img  fs      ssh     on  
   0 oneadmin oneadmin system                                                                                              - -   0             0 sys  -       ssh     on

Datastores on server 2 (deployment fails):

  ID USER     GROUP    NAME                                                                                             SIZE AVA CLUSTERS IMAGES TYPE DS      TM      STAT
 108 oneadmin oneadmin SAS_SYSTEM                                                                                       7.2T 95% 0             0 sys  -       qcow2   on  
 107 oneadmin oneadmin SAS_IMAGE                                                                                        7.2T 95% 0             0 img  fs      qcow2   on  
 106 oneadmin oneadmin SSD_SYSTEM                                                                                       1.8T 39% 0             0 sys  -       qcow2   on  
 105 oneadmin oneadmin SSD_IMAGE                                                                                        1.8T 39% 0             1 img  fs      qcow2   on  
   2 oneadmin oneadmin files                                                                                          915.3G 94% 0             0 fil  fs      ssh     on  
   1 oneadmin oneadmin default                                                                                        915.3G 94% 0             0 img  fs      ssh     on  
   0 oneadmin oneadmin system                                                                                              - -   0             0 sys  -       ssh     on

This is the failed deployment of a VM on the second server:

Tue May  9 11:44:33 2023 [Z0][VM][I]: New state is ACTIVE
Tue May  9 11:44:33 2023 [Z0][VM][I]: New LCM state is PROLOG
Tue May  9 11:44:34 2023 [Z0][VM][I]: New LCM state is BOOT
Tue May  9 11:44:34 2023 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/5/deployment.0
Tue May  9 11:44:35 2023 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Tue May  9 11:44:36 2023 [Z0][VMM][I]: ExitCode: 0
Tue May  9 11:44:36 2023 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Tue May  9 11:44:36 2023 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/106/5/deployment.0' 'one01' 5 one01
Tue May  9 11:44:36 2023 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/106/5/deployment.0
Tue May  9 11:44:36 2023 [Z0][VMM][I]: error: Cannot access storage file '/var/lib/one//datastores/106/5/disk.0' (as uid:1000, gid:1000): No such file or directory
Tue May  9 11:44:36 2023 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/106/5/deployment.0
Tue May  9 11:44:36 2023 [Z0][VMM][I]: ExitCode: 255
Tue May  9 11:44:36 2023 [Z0][VMM][I]: ExitCode: 0
Tue May  9 11:44:36 2023 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Tue May  9 11:44:36 2023 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Tue May  9 11:44:36 2023 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/106/5/deployment.0
Tue May  9 11:44:36 2023 [Z0][VM][I]: New LCM state is BOOT_FAILURE

Expected results:
The KVM deployment should create symlinks to the proper datastore, not create a new folder in /var/lib/one/datastores, I cant seem to figure out what is different on the 2nd server, resulting in boot failure…
Any help is appreciated, if more info is needed, let me know!

Creation of a correct deployment file is OK, it is readable and accessible, but the snapshots are created in " /var/lib/one/datastores/imageds " and not in the expected location: " /var/lib/one//datastores/106/5/disk.0 "

VURoland · May 9, 2023, 12:20pm

Here is the generated deployment file ONE made:
(This is located in the correct location, datastore 106, and VM nr. 6)

cat /var/lib/one//datastores/106/6/deployment.0
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
	<name>one-6</name>
	<title>backupdb</title>
	<vcpu><![CDATA[4]]></vcpu>
	<cputune>
		<shares>4096</shares>
	</cputune>
	<memory>12582912</memory>
	<os>
		<type arch='x86_64'>hvm</type>
	</os>
	<devices>
		<emulator><![CDATA[/usr/bin/qemu-system-x86_64]]></emulator>
		<disk type='file' device='disk'>
			<source file='/var/lib/one//datastores/106/6/disk.0'/>
			<target dev='vda' bus='virtio'/>
			<driver name='qemu' type='qcow2' cache='none'/>
		</disk>
		<disk type='file' device='cdrom'>
			<source file='/var/lib/one//datastores/106/6/disk.1'/>
			<target dev='hda' bus='ide'/>
			<readonly/>
			<driver name='qemu' type='raw'/>
		</disk>
		<interface type='bridge'>
			<source bridge='br192'/>
			<mac address='02:00:c0:a8:b2:ab'/>
			<target dev='one-6-0'/>
		</interface>
		<graphics type='vnc' listen='vnc-if' port='5906' passwd='RaNDoMpW'/>
		<input type='tablet' bus='usb'/>
	</devices>
	<features>
		<acpi/>
	</features>
	<metadata>
		<one:vm xmlns:one="http://opennebula.org/xmlns/libvirt/1.0">
			<one:system_datastore><![CDATA[/var/lib/one//datastores/106/6]]></one:system_datastore>
			<one:name><![CDATA[backupdb]]></one:name>
			<one:uname><![CDATA[oneadmin]]></one:uname>
			<one:uid>0</one:uid>
			<one:gname><![CDATA[oneadmin]]></one:gname>
			<one:gid>0</one:gid>
			<one:opennebula_version>5.12.0.4</one:opennebula_version>
			<one:stime>1683634435</one:stime>
			<one:deployment_time>1683634444</one:deployment_time>
		</one:vm>
	</metadata>
</domain>

VURoland · May 9, 2023, 3:08pm

UPDATE:
I tried deploying an Ubuntu image from the marketplace - no issue at all.
Could deploy normally, all contextualization worked, all files were placed in the correct datastores.

Only difference with the marketplace image is I’ve downloaded it from a similar ONE-hypervisor, and then I’ve added it to the database of the 2nd opennebula with:

oneimage create --type os --name backupdb-disk-1 --source /mnt/md5/imageds/0de962b57e0e913907d9223919534188 -d 105 --size 1000G

Whatever I try, I cannot get it to deploy it as its supposed to, but another image works fine, which leads me to some sort of image-issue, not a datastore issue.
I’ll work around this by copying the content from the “weird image” to a new image, but if anyone knows what could cause this, please enlighten me

Both are normal qcow2 v3 images BTW - If anyone knows a good way to import a qcow2 image from 1 cluster to another, please let me know your preferred method! Thanks in advance

VURoland · May 11, 2023, 10:14am

prob. found the issue:

earlier I tried adding the disk with “oneimage create” with a wrong datastore number:

oneimage create --type os --name backupdb-disk-1 --source /mnt/md5/imageds/0de962b57e0e913907d9223919534188 -d 31 --size 1000G

But datastore “31” doesnt exist. (i thought it was the disk ID)
Then I tried to run that command again with the correct datastore id, 105:

oneimage create --type os --name backupdb-disk-1 --source /mnt/md5/imageds/0de962b57e0e913907d9223919534188 -d 105 --size 1000G

But the datastore ID wasnt updated from 31 to 105. So when I use this image to deploy, opennebula cant find datastore 31, so it decided to just place the disksnapshot in /var/lib/one/datastores.

Workaround; I cloned the disk again, and added it to the VM template, now it starts OK with all files in the correct place => SOLVED

Topic		Replies	Views
Failure to deploy images from qcow2 datastore to qcow2 system datastores Community Support solved	6	1655	November 12, 2021
Re-using a /datastores directory of a previous install Storage	0	274	February 20, 2021
Data store migration - steps and resources Storage	8	3931	September 16, 2019
[one-5.2] fs_lvm driver and VM termination Community Support	2	669	November 10, 2016
Possible bug using local datastores for system images Community Support	3	1157	December 31, 2015

One 5.12 - Help! kvm deploys in wrong datastore?

Related topics