New to OpenNebula, first post…
I have a brand new LXD-only cluster, the sunstone node is a Ubuntu 19.10 VM, two LXD-nodes on physical (laptops) running Ubuntu 19.10. No KVM nodes.
I have a Qnap NAS device, mounted to all 3 nodes, 1x image datastore, 2x system datastores in addition to the default local datastores
I’ve downloaded images to the datastore, tried many ‘Apps’ with various Linux flavors
I cannot instantiate any VMs, at all. they all fail with errors similar to:
Fri May 1 13:15:53 2020 [Z0][VM][I]: New state is ACTIVE
Fri May 1 13:15:53 2020 [Z0][VM][I]: New LCM state is PROLOG
Fri May 1 13:18:46 2020 [Z0][VM][I]: New LCM state is BOOT
Fri May 1 13:18:46 2020 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/18/deployment.0
Fri May 1 13:18:48 2020 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri May 1 13:18:48 2020 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Fri May 1 13:18:49 2020 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/lxd/deploy '/var/lib/one//datastores/105/18/deployment.0' 'nebula2' 18 nebula2
Fri May 1 13:18:49 2020 [Z0][VMM][E]: deploy: Error: not found
Fri May 1 13:18:49 2020 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/client.rb:102:in `wait': {"type"=>"sync", "status"=>"Success", "status_code"=>200, "operation"=>"", "error_code"=>0, "error"=>"", "metadata"=>{"id"=>"61354f88-f231-4bb2-af42-1b17c49908c0", "class"=>"task", "description"=>"Creating container", "created_at"=>"2020-05-01T13:18:49.272181732Z", "updated_at"=>"2020-05-01T13:18:49.272181732Z", "status"=>"Failure", "status_code"=>400, "resources"=>{"containers"=>["/1.0/containers/one-18"], "instances"=>["/1.0/instances/one-18"]}, "metadata"=>nil, "may_cancel"=>false, "err"=>"Invalid devices: Device validation failed \"context\": Missing source \"/mnt/onimages/105/18/mapper/disk.1\" for disk \"context\"", "location"=>"none"}} (LXDError)
Fri May 1 13:18:49 2020 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:496:in `wait?'
Fri May 1 13:18:49 2020 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:134:in `create'
Fri May 1 13:18:49 2020 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:52:in `<main>'
Fri May 1 13:18:49 2020 [Z0][VMM][I]: ExitCode: 1
Fri May 1 13:18:49 2020 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Fri May 1 13:18:49 2020 [Z0][VMM][E]: Error deploying virtual machine
Fri May 1 13:18:49 2020 [Z0][VM][I]: New LCM state is BOOT_FAILURE
I can’t really make sense of the error, can anyone help?
front-end hostname: nebula0
lxd nodes: nebula1, nebula2
The system selected to use nebula2 for this VM instantiation.
The front-end node has /var/lib/one/vms/* subdirs.
Front-end node has deployment file /var/lib/one/vms/18/deployment.0
Nebula2 does not have /var/lib/one/vms
The way the LXD driver works is, it maps disk files on the datastores(like the context iso) into block devices, and then it mounts them into directories(this path right here for the context device) then that directory is added as the source key for a disk entry on the container configuration.
Can you show the output of lsblk on the node where the container is being deployed ?
How do I verify all services are running on the LXD nodes?
The opennebula-node-lxd package doesn’t add any running service to the virtualization nodes, only configures some kernel modules and the lxd server, you can take a look at the default profile, it should have been modified
Where does this path come from? From the marketplace app definition?
That function generates the path where the disk should be mounted, if you changed the datastore location in oned you need to update the lxdrc datastore location in order for this function to generate the proper mountpoint path. Is this the case ?
Quite possible that this is the root of it. I will try to un-do all my fancy NFS setups and go back to bone-stock local drives.
As an aside, I’m not impressed with the way datastores are handled…
OK, after a complete reinstall with ubuntu 19.10 and Opennebula 5.10,
front-end virtual as before, 1x lxd host on a physical as before, NO messing with datastores/nfs or anything.
this is what vm create log says:
Mon May 4 17:28:35 2020 [Z0][VM][I]: New state is ACTIVE
Mon May 4 17:28:35 2020 [Z0][VM][I]: New LCM state is PROLOG
Mon May 4 17:30:12 2020 [Z0][VM][I]: New LCM state is BOOT
Mon May 4 17:30:12 2020 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/1/deployment.0
Mon May 4 17:30:15 2020 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Mon May 4 17:30:15 2020 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Mon May 4 17:30:18 2020 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/lxd/deploy ‘/var/lib/one//datastores/0/1/deployment.0’ ‘nebula1’ 1 nebula1
Mon May 4 17:30:18 2020 [Z0][VMM][E]: deploy: To start your first container, try: lxc launch ubuntu:18.04
Mon May 4 17:30:18 2020 [Z0][VMM][I]:
Mon May 4 17:30:18 2020 [Z0][VMM][I]: Error: not found
Mon May 4 17:30:18 2020 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/client.rb:102:in wait': {"type"=>"sync", "status"=>"Success", "status_code"=>200, "operation"=>"", "error_code"=>0, "error"=>"", "metadata"=>{"id"=>"78b7418f-1ed4-4633-8d8b-787f71fee9ee", "class"=>"task", "description"=>"Creating container", "created_at"=>"2020-05-04T17:30:17.256300482Z", "updated_at"=>"2020-05-04T17:30:17.256300482Z", "status"=>"Failure", "status_code"=>400, "resources"=>{"containers"=>["/1.0/containers/one-1"]}, "metadata"=>nil, "may_cancel"=>false, "err"=>"Invalid devices: Missing source '/var/lib/one/datastores/0/1/mapper/disk.1' for disk 'context'", "location"=>"none"}} (LXDError) Mon May 4 17:30:18 2020 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:496:in wait?’
Mon May 4 17:30:18 2020 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:134:in create' Mon May 4 17:30:18 2020 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:52:in ’
Mon May 4 17:30:18 2020 [Z0][VMM][I]: ExitCode: 1
Mon May 4 17:30:18 2020 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Mon May 4 17:30:18 2020 [Z0][VMM][E]: Error deploying virtual machine
Mon May 4 17:30:18 2020 [Z0][VM][I]: New LCM state is BOOT_FAILURE
Can you please show the output of grep DATASTORE_LOCATION /etc/one/oned.conf and onedatastore show -x <system_datastore_id> of the system datastore being used for the containers ?
oneadmin@nebula0:~$ onedatastore show 0
DATASTORE 0 INFORMATION
ID : 0
NAME : system
USER : oneadmin
GROUP : oneadmin
CLUSTERS : 0
TYPE : SYSTEM
DS_MAD : -
TM_MAD : ssh
BASE PATH : /var/lib/one//datastores/0
DISK_TYPE : FILE
STATE : READY
IMAGES
oneadmin@nebula0:~$ onedatastore show 1
DATASTORE 1 INFORMATION
ID : 1
NAME : default
USER : oneadmin
GROUP : oneadmin
CLUSTERS : 0
TYPE : IMAGE
DS_MAD : fs
TM_MAD : ssh
BASE PATH : /var/lib/one//datastores/1
DISK_TYPE : FILE
STATE : READY
IMAGES
oneadmin@nebula0:~$ onedatastore show 2
DATASTORE 2 INFORMATION
ID : 2
NAME : files
USER : oneadmin
GROUP : oneadmin
CLUSTERS : 0
TYPE : FILE
DS_MAD : fs
TM_MAD : ssh
BASE PATH : /var/lib/one//datastores/2
DISK_TYPE : FILE
STATE : READY