Unable to deploy/instantiate LXD app/ Linux vm

New to OpenNebula, first post…
I have a brand new LXD-only cluster, the sunstone node is a Ubuntu 19.10 VM, two LXD-nodes on physical (laptops) running Ubuntu 19.10. No KVM nodes.

I have a Qnap NAS device, mounted to all 3 nodes, 1x image datastore, 2x system datastores in addition to the default local datastores

I’ve downloaded images to the datastore, tried many ‘Apps’ with various Linux flavors

I cannot instantiate any VMs, at all. they all fail with errors similar to:

Fri May 1 13:15:53 2020 [Z0][VM][I]: New state is ACTIVE
Fri May 1 13:15:53 2020 [Z0][VM][I]: New LCM state is PROLOG
Fri May 1 13:18:46 2020 [Z0][VM][I]: New LCM state is BOOT
Fri May 1 13:18:46 2020 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/18/deployment.0
Fri May 1 13:18:48 2020 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri May 1 13:18:48 2020 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Fri May 1 13:18:49 2020 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/lxd/deploy '/var/lib/one//datastores/105/18/deployment.0' 'nebula2' 18 nebula2
Fri May 1 13:18:49 2020 [Z0][VMM][E]: deploy: Error: not found
Fri May 1 13:18:49 2020 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/client.rb:102:in `wait': {"type"=>"sync", "status"=>"Success", "status_code"=>200, "operation"=>"", "error_code"=>0, "error"=>"", "metadata"=>{"id"=>"61354f88-f231-4bb2-af42-1b17c49908c0", "class"=>"task", "description"=>"Creating container", "created_at"=>"2020-05-01T13:18:49.272181732Z", "updated_at"=>"2020-05-01T13:18:49.272181732Z", "status"=>"Failure", "status_code"=>400, "resources"=>{"containers"=>["/1.0/containers/one-18"], "instances"=>["/1.0/instances/one-18"]}, "metadata"=>nil, "may_cancel"=>false, "err"=>"Invalid devices: Device validation failed \"context\": Missing source \"/mnt/onimages/105/18/mapper/disk.1\" for disk \"context\"", "location"=>"none"}} (LXDError)
Fri May 1 13:18:49 2020 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:496:in `wait?'
Fri May 1 13:18:49 2020 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:134:in `create'
Fri May 1 13:18:49 2020 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:52:in `<main>'
Fri May 1 13:18:49 2020 [Z0][VMM][I]: ExitCode: 1
Fri May 1 13:18:49 2020 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Fri May 1 13:18:49 2020 [Z0][VMM][E]: Error deploying virtual machine
Fri May 1 13:18:49 2020 [Z0][VM][I]: New LCM state is BOOT_FAILURE

I can’t really make sense of the error, can anyone help?

Best regards
K
(The New Guy)

front-end hostname: nebula0
lxd nodes: nebula1, nebula2
The system selected to use nebula2 for this VM instantiation.

The front-end node has /var/lib/one/vms/* subdirs.
Front-end node has deployment file /var/lib/one/vms/18/deployment.0
Nebula2 does not have /var/lib/one/vms

There is no subdir called “mapper” in the datastore/VMid location
Where does this path come from? From the marketplace app definition?

Hello Knut,

Did you install the opennebula-node-lxd package on lxd nodes?

Hello @sandude

Missing source “/mnt/onimages/105/18/mapper/disk.1”

The way the LXD driver works is, it maps disk files on the datastores(like the context iso) into block devices, and then it mounts them into directories(this path right here for the context device) then that directory is added as the source key for a disk entry on the container configuration.

Can you show the output of lsblk on the node where the container is being deployed ?

Yes using these commands:
wget -q -O- https://downloads.opennebula.org/repo/repo.key | apt-key add -
echo “deb https://downloads.opennebula.org/repo/5.10/Ubuntu/19.10 stable opennebula” > /etc/apt/sources.list.d/opennebula.list
apt-get update
apt-get install opennebula-node-lxd

How do I verify all services are running on the LXD nodes?

root@nebula1:/# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 89.1M 1 loop /snap/core/7917
loop1 7:1 0 54.7M 1 loop /snap/lxd/12211
loop2 7:2 0 93.9M 1 loop /snap/core/9066
loop3 7:3 0 55M 1 loop /snap/core18/1754
loop4 7:4 0 69.1M 1 loop /snap/lxd/14890
sda 8:0 0 465.8G 0 disk
├─sda1 8:1 0 512M 0 part /boot/efi
└─sda2 8:2 0 465.3G 0 part /
sr0 11:0 1 1024M 0 rom
root@nebula1:/#

On the second lxd node:
root@nebula2:/home/knut# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 54.7M 1 loop /snap/lxd/12211
loop1 7:1 0 89.1M 1 loop /snap/core/7917
loop2 7:2 0 93.9M 1 loop /snap/core/9066
loop3 7:3 0 55M 1 loop /snap/core18/1754
loop4 7:4 0 69.1M 1 loop /snap/lxd/14890
sda 8:0 0 298.1G 0 disk
├─sda1 8:1 0 1M 0 part
└─sda2 8:2 0 298.1G 0 part /
sr0 11:0 1 1024M 0 rom
root@nebula2:/home/knut#

How do I verify all services are running on the LXD nodes?

The opennebula-node-lxd package doesn’t add any running service to the virtualization nodes, only configures some kernel modules and the lxd server, you can take a look at the default profile, it should have been modified

Where does this path come from? From the marketplace app definition?

That function generates the path where the disk should be mounted, if you changed the datastore location in oned you need to update the lxdrc datastore location in order for this function to generate the proper mountpoint path. Is this the case ?

Quite possible that this is the root of it. I will try to un-do all my fancy NFS setups and go back to bone-stock local drives.
As an aside, I’m not impressed with the way datastores are handled…

OK, after a complete reinstall with ubuntu 19.10 and Opennebula 5.10,
front-end virtual as before, 1x lxd host on a physical as before, NO messing with datastores/nfs or anything.
this is what vm create log says:

Mon May 4 17:28:35 2020 [Z0][VM][I]: New state is ACTIVE
Mon May 4 17:28:35 2020 [Z0][VM][I]: New LCM state is PROLOG
Mon May 4 17:30:12 2020 [Z0][VM][I]: New LCM state is BOOT
Mon May 4 17:30:12 2020 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/1/deployment.0
Mon May 4 17:30:15 2020 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Mon May 4 17:30:15 2020 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Mon May 4 17:30:18 2020 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/lxd/deploy ‘/var/lib/one//datastores/0/1/deployment.0’ ‘nebula1’ 1 nebula1
Mon May 4 17:30:18 2020 [Z0][VMM][E]: deploy: To start your first container, try: lxc launch ubuntu:18.04
Mon May 4 17:30:18 2020 [Z0][VMM][I]:
Mon May 4 17:30:18 2020 [Z0][VMM][I]: Error: not found
Mon May 4 17:30:18 2020 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/client.rb:102:in wait': {"type"=>"sync", "status"=>"Success", "status_code"=>200, "operation"=>"", "error_code"=>0, "error"=>"", "metadata"=>{"id"=>"78b7418f-1ed4-4633-8d8b-787f71fee9ee", "class"=>"task", "description"=>"Creating container", "created_at"=>"2020-05-04T17:30:17.256300482Z", "updated_at"=>"2020-05-04T17:30:17.256300482Z", "status"=>"Failure", "status_code"=>400, "resources"=>{"containers"=>["/1.0/containers/one-1"]}, "metadata"=>nil, "may_cancel"=>false, "err"=>"Invalid devices: Missing source '/var/lib/one/datastores/0/1/mapper/disk.1' for disk 'context'", "location"=>"none"}} (LXDError) Mon May 4 17:30:18 2020 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:496:in wait?’
Mon May 4 17:30:18 2020 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:134:in create' Mon May 4 17:30:18 2020 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:52:in
Mon May 4 17:30:18 2020 [Z0][VMM][I]: ExitCode: 1
Mon May 4 17:30:18 2020 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Mon May 4 17:30:18 2020 [Z0][VMM][E]: Error deploying virtual machine
Mon May 4 17:30:18 2020 [Z0][VM][I]: New LCM state is BOOT_FAILURE

Can you please show the output of grep DATASTORE_LOCATION /etc/one/oned.conf and onedatastore show -x <system_datastore_id> of the system datastore being used for the containers ?

Also what LXD version are you using ?

lxd --version

Bad timing, I’ve just torn down the entire environment to rebuild, possibly with Ubuntu 18.04 instead of 19.10…

Is the /etc/one/oned.conf found only on the front-end node or all physical worker nodes also?

It was whichever one is the normal one for Ubuntu 19.10…
I tore them all down to rebuild, i can see if i can get it from one of them still, stand by

lxd --version
3.18

/etc/one/oned.conf

Only in the frontend, this is the config file which is loaded by the opennebula service before starting

lxd --version
3.18

It’s very likely this is the issue, we only support the LTS version lxd 3.0.4. Can you use that version instead ?

sudo snap install lxd --channel=3.0/stable

There is also planned support to the new LTS 4.0.x

OK… feels like this should be included in the install guides for 19.10…

Brand new front-end install, ubuntu 18.04

oneadmin@nebula0:~$ onedatastore show 0
DATASTORE 0 INFORMATION
ID : 0
NAME : system
USER : oneadmin
GROUP : oneadmin
CLUSTERS : 0
TYPE : SYSTEM
DS_MAD : -
TM_MAD : ssh
BASE PATH : /var/lib/one//datastores/0
DISK_TYPE : FILE
STATE : READY

DATASTORE CAPACITY
TOTAL: : -
FREE: : -
USED: : -
LIMIT: : -

PERMISSIONS
OWNER : um-
GROUP : u–
OTHER : —

DATASTORE TEMPLATE
ALLOW_ORPHANS=“NO”
DISK_TYPE=“FILE”
DS_MIGRATE=“YES”
RESTRICTED_DIRS=“/”
SAFE_DIRS=“/var/tmp”
SHARED=“NO”
TM_MAD=“ssh”
TYPE=“SYSTEM_DS”

IMAGES
oneadmin@nebula0:~$ onedatastore show 1
DATASTORE 1 INFORMATION
ID : 1
NAME : default
USER : oneadmin
GROUP : oneadmin
CLUSTERS : 0
TYPE : IMAGE
DS_MAD : fs
TM_MAD : ssh
BASE PATH : /var/lib/one//datastores/1
DISK_TYPE : FILE
STATE : READY

DATASTORE CAPACITY
TOTAL: : 97.9G
FREE: : 88.6G
USED: : 4.3G
LIMIT: : -

PERMISSIONS
OWNER : um-
GROUP : u–
OTHER : —

DATASTORE TEMPLATE
ALLOW_ORPHANS=“NO”
CLONE_TARGET=“SYSTEM”
DISK_TYPE=“FILE”
DS_MAD=“fs”
LN_TARGET=“SYSTEM”
RESTRICTED_DIRS=“/”
SAFE_DIRS=“/var/tmp”
TM_MAD=“ssh”
TYPE=“IMAGE_DS”

IMAGES
oneadmin@nebula0:~$ onedatastore show 2
DATASTORE 2 INFORMATION
ID : 2
NAME : files
USER : oneadmin
GROUP : oneadmin
CLUSTERS : 0
TYPE : FILE
DS_MAD : fs
TM_MAD : ssh
BASE PATH : /var/lib/one//datastores/2
DISK_TYPE : FILE
STATE : READY

DATASTORE CAPACITY
TOTAL: : 97.9G
FREE: : 88.6G
USED: : 4.3G
LIMIT: : -

PERMISSIONS
OWNER : um-
GROUP : u–
OTHER : —

DATASTORE TEMPLATE
ALLOW_ORPHANS=“NO”
CLONE_TARGET=“SYSTEM”
DS_MAD=“fs”
LN_TARGET=“SYSTEM”
RESTRICTED_DIRS=“/”
SAFE_DIRS=“/var/tmp”
TM_MAD=“ssh”
TYPE=“FILE_DS”

IMAGES
oneadmin@nebula0:~$

Also, no lxd worker nodes installed yet