Can not instantiate any KVM VM

Hello,

I’ve installed OpenNebula 5.12.0.3 together with a ceph cluster for personal use, and facing with a confusing issue.

  • Created ceph data stores (image:103 and system:102)
  • Downloaded a kvm image from MarketPlaces into the image datastore
  • Tried to instantiate the associated template into the ceph_system data store - no success
  • To narrow down the issue, tried to instantiate into system data store - no success

VM log:
Sun Oct 25 12:13:55 2020 [Z0][VM][I]: New state is ACTIVE
Sun Oct 25 12:13:55 2020 [Z0][VM][I]: New LCM state is PROLOG
Sun Oct 25 12:14:13 2020 [Z0][VM][I]: New LCM state is BOOT
Sun Oct 25 12:14:13 2020 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/16/deployment.0
Sun Oct 25 12:14:17 2020 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Sun Oct 25 12:14:19 2020 [Z0][VMM][I]: ExitCode: 0
Sun Oct 25 12:14:19 2020 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Sun Oct 25 12:14:20 2020 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy ‘/var/lib/one//datastores/0/16/deployment.0’ ‘node8-kvm’ 16 node8-kvm
Sun Oct 25 12:14:20 2020 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/16/deployment.0
Sun Oct 25 12:14:20 2020 [Z0][VMM][I]: error: internal error: process exited while connecting to monitor: 2020-10-25T11:14:20.216694Z qemu-system-x86_64: -drive file=rbd:one/one-3-16-0:id=libvirt:auth_supported=cephx;none:mon_host=10.20.0.11:6789;10.20.0.12:6789;10.20.0.13:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0: error connecting: Permission denied
Sun Oct 25 12:14:20 2020 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/0/16/deployment.0
Sun Oct 25 12:14:20 2020 [Z0][VMM][I]: ExitCode: 255
Sun Oct 25 12:14:20 2020 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Sun Oct 25 12:14:20 2020 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/0/16/deployment.0
Sun Oct 25 12:14:20 2020 [Z0][VM][I]: New LCM state is BOOT_FAILURE

I noticed that Opennebula created some files on the compute node:
oneadmin@node8:~/datastores/0/16$ ls -l
total 1163636
-rw-rw-r-- 1 oneadmin oneadmin 1776 Oct 25 12:14 deployment.0
-rw-r–r-- 1 oneadmin oneadmin 2361393152 Oct 25 12:14 disk.0
-rw-r–r-- 1 oneadmin oneadmin 372736 Oct 25 12:14 disk.1
oneadmin@node8:~/datastores/0/16$ qemu-img info disk.0
image: disk.0
file format: raw
virtual size: 2.2G (2361393152 bytes)
disk size: 1.1G
oneadmin@node8:~/datastores/0/16$ qemu-img info disk.1
image: disk.1
file format: raw
virtual size: 364K (372736 bytes)
disk size: 364K
oneadmin@node8:~/datastores/0/16$ file -L -s disk.0
disk.0: DOS/MBR boot sector, extended partition table (last)
oneadmin@node8:~/datastores/0/16$ file -L -s disk.1
disk.1: ISO 9660 CD-ROM filesystem data ‘CONTEXT’

Tried to create the VM by hand:
oneadmin@node8:~/datastores/0/16$ virsh -c qemu:///system create deployment.0
error: Failed to create domain from deployment.0
error: internal error: process exited while connecting to monitor: 2020-10-25T11:18:19.784321Z qemu-system-x86_64: -drive file=rbd:one/one-3-16-0:id=libvirt:auth_supported=cephx;none:mon_host=10.20.0.11:6789;10.20.0.12:6789;10.20.0.13:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0: error connecting: Permission denied

Noticing that only the owner allowed to write disk.0, I modified the file permissions by hand:
oneadmin@node8:~/datastores/0/16$ ls -l
total 1163636
-rw-rw-r-- 1 oneadmin oneadmin 1776 Oct 25 12:14 deployment.0
-rw-rw-rw- 1 oneadmin oneadmin 2361393152 Oct 25 12:14 disk.0
-rw-r–r-- 1 oneadmin oneadmin 372736 Oct 25 12:14 disk.1

The retried vm creation either by hand or via the Opennebula GUI (Recover->Retry) produced the same results as above.
oneadmin@node8:~/datastores/0/16$ virsh -c qemu:///system create deployment.0
error: Failed to create domain from deployment.0
error: internal error: process exited while connecting to monitor: 2020-10-25T11:20:17.967780Z qemu-system-x86_64: -drive file=rbd:one/one-3-16-0:id=libvirt:auth_supported=cephx;none:mon_host=10.20.0.11:6789;10.20.0.12:6789;10.20.0.13:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0: error connecting: Permission denied

oneadmin@node8:~/datastores/0/16$ virsh list
Id Name State

Retry (Recover->Retry) on opennebula GUI:
Sun Oct 25 12:14:20 2020 [Z0][VM][I]: New LCM state is BOOT_FAILURE
Sun Oct 25 12:22:11 2020 [Z0][VM][I]: New LCM state is BOOT
Sun Oct 25 12:22:11 2020 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/16/deployment.0
Sun Oct 25 12:22:15 2020 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Sun Oct 25 12:22:18 2020 [Z0][VMM][I]: ExitCode: 0
Sun Oct 25 12:22:18 2020 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Sun Oct 25 12:22:18 2020 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy ‘/var/lib/one//datastores/0/16/deployment.0’ ‘node8-kvm’ 16 node8-kvm
Sun Oct 25 12:22:18 2020 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/16/deployment.0
Sun Oct 25 12:22:18 2020 [Z0][VMM][I]: error: internal error: process exited while connecting to monitor: 2020-10-25T11:22:18.687631Z qemu-system-x86_64: -drive file=rbd:one/one-3-16-0:id=libvirt:auth_supported=cephx;none:mon_host=10.20.0.11:6789;10.20.0.12:6789;10.20.0.13:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0: error connecting: Permission denied
Sun Oct 25 12:22:18 2020 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/0/16/deployment.0
Sun Oct 25 12:22:18 2020 [Z0][VMM][I]: ExitCode: 255
Sun Oct 25 12:22:18 2020 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Sun Oct 25 12:22:18 2020 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/0/16/deployment.0
Sun Oct 25 12:22:18 2020 [Z0][VM][I]: New LCM state is BOOT_FAILURE

At this point I decided to create a VM on node8 by virt-manager, using the same disk files as Opennebula created. For my biggest surprise, that VM creation was successful, the VM booted and its console is accessible by virt-manager GUI.
oneadmin@node8:~/datastores/0/16$ virsh list
Id Name State

16 one-16-copy running

My current conclusion is that every components that takes part of a vm creation works properly, yet the end result is unsuccessful. Moreover, I can create lxc containers on any compute node by Opennebula using images from MarketPlaces.

oneadmin@node8:~$ lxc list
±------±--------±---------------------±-----±-----------±----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
±------±--------±---------------------±-----±-----------±----------+
| one-6 | RUNNING | 192.168.1.101 (eth0) | | PERSISTENT | 0 |
±------±--------±---------------------±-----±-----------±----------+

Any advice to solve my issue would be welcome.

Some detailed information follows.

The xml created by virt-manager:
oneadmin@node8:~/datastores/0/16$ sudo cat /etc/libvirt/qemu/one-16-copy.xml

one-16-copy c5aaeb57-cc5d-461f-bc1c-282b0725a9c1 2097152 2097152 2 hvm Nehalem-IBRS destroy restart destroy /usr/bin/kvm-spice

Opennebula-generated xml:
oneadmin@node8:~/datastores/0/16$ cat ./deployment.0

one-16
Ubuntu 18.04 KVM local 2nd

2048

2097152

hvm

































<one:vm xmlns:one=“http://opennebula.org/xmlns/libvirt/1.0”>
one:system_datastore</one:system_datastore>
one:name</one:name>
one:uname</one:uname>
one:uid0</one:uid>
one:gname</one:gname>
one:gid0</one:gid>
one:opennebula_version5.10.1</one:opennebula_version>
one:stime1603624413</one:stime>
one:deployment_time1603624931</one:deployment_time>
</one:vm>

oneadmin@node8:~/datastores/0/16$ rbd list -l one
NAME SIZE PARENT FMT PROT LOCK
one-3 2.20GiB 2
one-3@snap 2.20GiB 2 yes
one-4 1GiB 2
one-4@snap 1GiB 2 yes
one-4-6-0 1GiB one/one-4@snap 2 excl
one-4-7-0 20GiB one/one-4@snap 2 excl

oneadmin@node8:~$ cat /etc/hosts
127.0.0.1 localhost
10.20.0.11 node1
10.20.0.12 node2
10.20.0.13 node3
10.20.0.15 node5
10.20.0.15 node5-lxd
10.20.0.15 node5-kvm
10.20.0.16 node6
10.20.0.16 node6-lxd
10.20.0.16 node6-kvm
10.20.0.17 node7
10.20.0.17 node7-lxd
10.20.0.17 node7-kvm
10.20.0.18 node8
10.20.0.18 node8-lxd
10.20.0.18 node8-kvm
10.20.0.30 one-fe

root@node8:/var/log/libvirt# cat libvirtd.log

2020-10-25 12:06:08.520+0000: 25845: debug : qemuMonitorJSONIOProcessLine:193 : Line [{“return”: [{“iops_rd”: 0, “detect_zeroes”: “off”, “image”: {“virtual-size”: 372736, “filename”: “/var/lib/one/datastores/0/16/disk.1”, “format”: “raw”, “actual-size”: 372736, “dirty-flag”: false}, “iops_wr”: 0, “ro”: true, “node-name”: “#block375”, “backing_file_depth”: 0, “drv”: “raw”, “iops”: 0, “bps_wr”: 0, “write_threshold”: 0, “encrypted”: false, “bps”: 0, “bps_rd”: 0, “cache”: {“no-flush”: false, “direct”: false, “writeback”: true}, “file”: “/var/lib/one/datastores/0/16/disk.1”, “encryption_key_missing”: false}, {“iops_rd”: 0, “detect_zeroes”: “off”, “image”: {“virtual-size”: 372736, “filename”: “/var/lib/one/datastores/0/16/disk.1”, “format”: “file”, “actual-size”: 372736, “dirty-flag”: false}, “iops_wr”: 0, “ro”: true, “node-name”: “#block205”, “backing_file_depth”: 0, “drv”: “file”, “iops”: 0, “bps_wr”: 0, “write_threshold”: 0, “encrypted”: false, “bps”: 0, “bps_rd”: 0, “cache”: {“no-flush”: false, “direct”: false, “writeback”: true}, “file”: “/var/lib/one/datastores/0/16/disk.1”, “encryption_key_missing”: false}, {“iops_rd”: 0, “detect_zeroes”: “off”, “image”: {“virtual-size”: 2361393152, “filename”: “/var/lib/one/datastores/0/16/disk.0”, “format”: “raw”, “actual-size”: 1197514752, “dirty-flag”: false}, “iops_wr”: 0, “ro”: false, “node-name”: “#block149”, “backing_file_depth”: 0, “drv”: “raw”, “iops”: 0, “bps_wr”: 0, “write_threshold”: 0, “encrypted”: false, “bps”: 0, “bps_rd”: 0, “cache”: {“no-flush”: false, “direct”: false, “writeback”: true}, “file”: “/var/lib/one/datastores/0/16/disk.0”, “encryption_key_missing”: false}, {“iops_rd”: 0, “detect_zeroes”: “off”, “image”: {“virtual-size”: 2361393152, “filename”: “/var/lib/one/datastores/0/16/disk.0”, “format”: “file”, “actual-size”: 1197514752, “dirty-flag”: false}, “iops_wr”: 0, “ro”: false, “node-name”: “#block070”, “backing_file_depth”: 0, “drv”: “file”, “iops”: 0, “bps_wr”: 0, “write_threshold”: 0, “encrypted”: false, “bps”: 0, “bps_rd”: 0, “cache”: {“no-flush”: false, “direct”: false, “writeback”: true}, “file”: “/var/lib/one/datastores/0/16/disk.0”, “encryption_key_missing”: false}], “id”: “libvirt-212”}]
2020-10-25 12:06:08.520+0000: 25845: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7f83f40540c0 reply={“return”: [{“iops_rd”: 0, “detect_zeroes”: “off”, “image”: {“virtual-size”: 372736, “filename”: “/var/lib/one/datastores/0/16/disk.1”, “format”: “raw”, “actual-size”: 372736, “dirty-flag”: false}, “iops_wr”: 0, “ro”: true, “node-name”: “#block375”, “backing_file_depth”: 0, “drv”: “raw”, “iops”: 0, “bps_wr”: 0, “write_threshold”: 0, “encrypted”: false, “bps”: 0, “bps_rd”: 0, “cache”: {“no-flush”: false, “direct”: false, “writeback”: true}, “file”: “/var/lib/one/datastores/0/16/disk.1”, “encryption_key_missing”: false}, {“iops_rd”: 0, “detect_zeroes”: “off”, “image”: {“virtual-size”: 372736, “filename”: “/var/lib/one/datastores/0/16/disk.1”, “format”: “file”, “actual-size”: 372736, “dirty-flag”: false}, “iops_wr”: 0, “ro”: true, “node-name”: “#block205”, “backing_file_depth”: 0, “drv”: “file”, “iops”: 0, “bps_wr”: 0, “write_threshold”: 0, “encrypted”: false, “bps”: 0, “bps_rd”: 0, “cache”: {“no-flush”: false, “direct”: false, “writeback”: true}, “file”: “/var/lib/one/datastores/0/16/disk.1”, “encryption_key_missing”: false}, {“iops_rd”: 0, “detect_zeroes”: “off”, “image”: {“virtual-size”: 2361393152, “filename”: “/var/lib/one/datastores/0/16/disk.0”, “format”: “raw”, “actual-size”: 1197514752, “dirty-flag”: false}, “iops_wr”: 0, “ro”: false, “node-name”: “#block149”, “backing_file_depth”: 0, “drv”: “raw”, “iops”: 0, “bps_wr”: 0, “write_threshold”: 0, “encrypted”: false, “bps”: 0, “bps_rd”: 0, “cache”: {“no-flush”: false, “direct”: false, “writeback”: true}, “file”: “/var/lib/one/datastores/0/16/disk.0”, “encryption_key_missing”: false}, {“iops_rd”: 0, “detect_zeroes”: “off”, “image”: {“virtual-size”: 2361393152, “filename”: “/var/lib/one/datastores/0/16/disk.0”, “format”: “file”, “actual-size”: 1197514752, “dirty-flag”: false}, “iops_wr”: 0, “ro”: false, “node-name”: “#block070”, “backing_file_depth”: 0, “drv”: “file”, “iops”: 0, “bps_wr”: 0, “write_threshold”: 0, “encrypted”: false, “bps”: 0, “bps_rd”: 0, “cache”: {“no-flush”: false, “direct”: false, “writeback”: true}, “file”: “/var/lib/one/datastores/0/16/disk.0”, “encryption_key_missing”: false}], “id”: “libvirt-212”}
2020-10-25 12:06:08.520+0000: 25850: debug : qemuMonitorJSONCommandWithFd:306 : Receive command reply ret=0 rxObject=0x55781f6c30c0
2020-10-25 12:06:08.520+0000: 25850: debug : qemuDomainObjExitMonitorInternal:5142 : Exited monitor (mon=0x7f83f40540c0 vm=0x7f83f4046eb0 name=one-16-copy)
2020-10-25 12:06:08.520+0000: 25850: debug : qemuDomainObjEndJob:5050 : Stopping job: query (async=none vm=0x7f83f4046eb0 name=one-16-copy)

oneadmin@one-fe:~$ onevm show -x 16
not attached yet because of character number limit…

oneadmin@one-fe:~$ onedatastore list -x
<DATASTORE_POOL>

103
0
0
oneadmin
oneadmin
cephds

<OWNER_U>1</OWNER_U>
<OWNER_M>1</OWNER_M>
<OWNER_A>0</OWNER_A>
<GROUP_U>1</GROUP_U>
<GROUP_M>0</GROUP_M>
<GROUP_A>0</GROUP_A>
<OTHER_U>0</OTHER_U>
<OTHER_M>0</OTHER_M>
<OTHER_A>0</OTHER_A>

<DS_MAD></DS_MAD>
<TM_MAD></TM_MAD>
<BASE_PATH></BASE_PATH>
0
<DISK_TYPE>3</DISK_TYPE>
0

0

<TOTAL_MB>5436909</TOTAL_MB>
<FREE_MB>5431676</FREE_MB>
<USED_MB>5232</USED_MB>

3
4


<ALLOW_ORPHANS></ALLOW_ORPHANS>
<BRIDGE_LIST></BRIDGE_LIST>
<CEPH_HOST></CEPH_HOST>
<CEPH_SECRET></CEPH_SECRET>
<CEPH_USER></CEPH_USER>
<CLONE_TARGET></CLONE_TARGET>
<CLONE_TARGET_SHARED></CLONE_TARGET_SHARED>
<CLONE_TARGET_SSH></CLONE_TARGET_SSH>
<DATASTORE_CAPACITY_CHECK></DATASTORE_CAPACITY_CHECK>
<DISK_TYPE></DISK_TYPE>
<DISK_TYPE_SHARED></DISK_TYPE_SHARED>
<DISK_TYPE_SSH></DISK_TYPE_SSH>

<DS_MAD></DS_MAD>
<LN_TARGET></LN_TARGET>
<LN_TARGET_SHARED></LN_TARGET_SHARED>
<LN_TARGET_SSH></LN_TARGET_SSH>
<POOL_NAME></POOL_NAME>
<RESTRICTED_DIRS></RESTRICTED_DIRS>
<SAFE_DIRS></SAFE_DIRS>
<TM_MAD></TM_MAD>
<TM_MAD_SYSTEM></TM_MAD_SYSTEM>




102
0
0
oneadmin
oneadmin
ceph_system

<OWNER_U>1</OWNER_U>
<OWNER_M>1</OWNER_M>
<OWNER_A>1</OWNER_A>
<GROUP_U>1</GROUP_U>
<GROUP_M>1</GROUP_M>
<GROUP_A>1</GROUP_A>
<OTHER_U>0</OTHER_U>
<OTHER_M>0</OTHER_M>
<OTHER_A>0</OTHER_A>

<DS_MAD></DS_MAD>
<TM_MAD></TM_MAD>
<BASE_PATH></BASE_PATH>
1
<DISK_TYPE>3</DISK_TYPE>
0

0

<TOTAL_MB>5436909</TOTAL_MB>
<FREE_MB>5431676</FREE_MB>
<USED_MB>5232</USED_MB>


<ALLOW_ORPHANS></ALLOW_ORPHANS>
<BRIDGE_LIST></BRIDGE_LIST>
<CEPH_HOST></CEPH_HOST>
<CEPH_SECRET></CEPH_SECRET>
<CEPH_USER></CEPH_USER>
<DATASTORE_CAPACITY_CHECK></DATASTORE_CAPACITY_CHECK>
<DISK_TYPE></DISK_TYPE>
<DS_MIGRATE></DS_MIGRATE>
<POOL_NAME></POOL_NAME>
<RESTRICTED_DIRS></RESTRICTED_DIRS>
<SAFE_DIRS></SAFE_DIRS>

<TM_MAD></TM_MAD>




2
0
0
oneadmin
oneadmin
files

<OWNER_U>1</OWNER_U>
<OWNER_M>1</OWNER_M>
<OWNER_A>0</OWNER_A>
<GROUP_U>1</GROUP_U>
<GROUP_M>0</GROUP_M>
<GROUP_A>0</GROUP_A>
<OTHER_U>0</OTHER_U>
<OTHER_M>0</OTHER_M>
<OTHER_A>0</OTHER_A>

<DS_MAD></DS_MAD>
<TM_MAD></TM_MAD>
<BASE_PATH></BASE_PATH>
2
<DISK_TYPE>0</DISK_TYPE>
0

0

<TOTAL_MB>30107</TOTAL_MB>
<FREE_MB>20701</FREE_MB>
<USED_MB>7855</USED_MB>


<ALLOW_ORPHANS></ALLOW_ORPHANS>
<CLONE_TARGET></CLONE_TARGET>
<DS_MAD></DS_MAD>
<LN_TARGET></LN_TARGET>
<RESTRICTED_DIRS></RESTRICTED_DIRS>
<SAFE_DIRS></SAFE_DIRS>
<TM_MAD></TM_MAD>




1
0
0
oneadmin
oneadmin
default

<OWNER_U>1</OWNER_U>
<OWNER_M>1</OWNER_M>
<OWNER_A>0</OWNER_A>
<GROUP_U>1</GROUP_U>
<GROUP_M>0</GROUP_M>
<GROUP_A>0</GROUP_A>
<OTHER_U>0</OTHER_U>
<OTHER_M>0</OTHER_M>
<OTHER_A>0</OTHER_A>

<DS_MAD></DS_MAD>
<TM_MAD></TM_MAD>
<BASE_PATH></BASE_PATH>
0
<DISK_TYPE>0</DISK_TYPE>
0

0

<TOTAL_MB>30107</TOTAL_MB>
<FREE_MB>20701</FREE_MB>
<USED_MB>7855</USED_MB>


<ALLOW_ORPHANS></ALLOW_ORPHANS>
<CLONE_TARGET></CLONE_TARGET>
<DISK_TYPE></DISK_TYPE>
<DS_MAD></DS_MAD>
<LN_TARGET></LN_TARGET>
<RESTRICTED_DIRS></RESTRICTED_DIRS>
<SAFE_DIRS></SAFE_DIRS>
<TM_MAD></TM_MAD>




0
0
0
oneadmin
oneadmin
system

<OWNER_U>1</OWNER_U>
<OWNER_M>1</OWNER_M>
<OWNER_A>0</OWNER_A>
<GROUP_U>1</GROUP_U>
<GROUP_M>0</GROUP_M>
<GROUP_A>0</GROUP_A>
<OTHER_U>0</OTHER_U>
<OTHER_M>0</OTHER_M>
<OTHER_A>0</OTHER_A>

<DS_MAD></DS_MAD>
<TM_MAD></TM_MAD>
<BASE_PATH></BASE_PATH>
1
<DISK_TYPE>0</DISK_TYPE>
0

0

<TOTAL_MB>0</TOTAL_MB>
<FREE_MB>0</FREE_MB>
<USED_MB>0</USED_MB>


<ALLOW_ORPHANS></ALLOW_ORPHANS>
<DISK_TYPE></DISK_TYPE>
<DS_MIGRATE></DS_MIGRATE>
<RESTRICTED_DIRS></RESTRICTED_DIRS>
<SAFE_DIRS></SAFE_DIRS>

<TM_MAD></TM_MAD>



</DATASTORE_POOL>

This was my first post on this mailing list and the appearance of published version differs from expected.
For improving readability, I repeat the two VM XMLs below.
I hope someone can help to solve the issue.

Opennebula-generated xml:
oneadmin@node8:~/datastores/0/16$ cat ./deployment.0

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
        <name>one-16</name>
        <title>Ubuntu 18.04 KVM local 2nd</title>
        <cputune>
                <shares>2048</shares>
        </cputune>
        <memory>2097152</memory>
        <os>
                <type arch='x86_64'>hvm</type>
        </os>
        <devices>
                <emulator><![CDATA[/usr/bin/qemu-system-x86_64]]></emulator>
                <disk type='network' device='disk'>
                        <source protocol='rbd' name='one/one-3-16-0'>
                                <host name='10.20.0.11' port='6789'/>
                                <host name='10.20.0.12' port='6789'/>
                                <host name='10.20.0.13' port='6789'/>
                        </source>
                        <auth username='libvirt'>
                                <secret type='ceph' uuid='deca460f-986c-490e-b30c-4333e33986ea'/>
                        </auth>
                        <target dev='vda' bus='virtio'/>
                        <boot order='1'/>
                        <driver name='qemu' type='raw' cache='default'/>
                </disk>
                <disk type='file' device='cdrom'>
                        <source file='/var/lib/one//datastores/0/16/disk.1'/>
                        <target dev='hda' bus='ide'/>
                        <readonly/>
                        <driver name='qemu' type='raw'/>
                </disk>
                <interface type='bridge'>
                        <source bridge='br-public'/>
                        <mac address='02:00:c0:a8:01:68'/>
                        <target dev='one-16-0'/>
                </interface>
                <graphics type='vnc' listen='0.0.0.0' port='5916'/>
        </devices>
        <features>
                <acpi/>
        </features>
        <metadata>
                <one:vm xmlns:one="http://opennebula.org/xmlns/libvirt/1.0">
                        <one:system_datastore><![CDATA[/var/lib/one//datastores/0/16]]></one:system_datastore>
                        <one:name><![CDATA[Ubuntu 18.04 KVM local 2nd]]></one:name>
                        <one:uname><![CDATA[oneadmin]]></one:uname>
                        <one:uid>0</one:uid>
                        <one:gname><![CDATA[oneadmin]]></one:gname>
                        <one:gid>0</one:gid>
                        <one:opennebula_version>5.10.1</one:opennebula_version>
                        <one:stime>1603624413</one:stime>
                        <one:deployment_time>1603624931</one:deployment_time>
                </one:vm>
        </metadata>
</domain>

The xml created by virt-manager:
oneadmin@node8:~/datastores/0/16$ sudo cat /etc/libvirt/qemu/one-16-copy.xml

<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
      virsh edit one-16-copy
    or other application using the libvirt API. 
-->

<domain type='kvm'>
  <name>one-16-copy</name>
  <uuid>c5aaeb57-cc5d-461f-bc1c-282b0725a9c1</uuid>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-bionic'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state='off'/>
  </features>
  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>Nehalem-IBRS</model>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/one/datastores/0/16/disk.0'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/one/datastores/0/16/disk.1'/>
      <target dev='hdb' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:17:a4:f6'/>
      <source bridge='br-public'/>
      <model type='rtl8139'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' autoport='yes'>
      <listen type='address'/>
      <image compression='off'/>
    </graphics>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='1'/>
    </redirdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='2'/>
    </redirdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </memballoon>
  </devices>
</domain>

hi,

if you are running in ubuntu this has something to do with apparmor

would you be able to confirm by setting security_driver=none in /etc/libvirt/qemu.conf and restart libvirtd?

cheers,
x

Hi xander,

Thank you very much for your reply.
Yes, I’m on Ubuntu 18.04
When googling for hints (qemu permission denied), I found many articles stating that problem solved by adjusting qemu.conf and/or permissions of /dev/kvm
I played with those settings without success.
My current settings are:
qemu.conf
user = "root"
group = "root"
dynamic_ownership = 0
security_driver = "none"

ls -l /dev/kvm
crw-rw-rw- 1 root kvm 10, 232 Oct 26 11:52 /dev/kvm

I’ve tried in qemu.conf:
user = "oneadmin"
group = "kvm"
with the same unsuccess.
Groups “oneadmin” user belongs to:
oneadmin@node8:~$ groups
oneadmin disk sudo lxd kvm libvirt

Did I miss the right combination?

BR,
/M

hi

that’s a bit strange as when you install opennebula-node in ubuntu (assuming for kvm host node) it already puts the permissions at the bottom of qemu.conf as below:

user = "oneadmin"
group = "oneadmin"
dynamic_ownership = 0

all you would need to do is add security_driver="none" and restart libvirtd service and it should work

can you adjust qemu.conf with this:

user = "oneadmin"
group = "oneadmin"
dynamic_ownership = 0
security_driver="none"

and systemctl restart libvirtd ?

cheers
x

Hi,

I’ve made the settings you suggested.
In fact that means that the default settings were restored except security_driver:

oneadmin@node8:~/datastores/0/16$ sudo cat /etc/libvirt/qemu.conf | grep security_driver
#       security_driver = [ "selinux", "apparmor" ]
# value of security_driver cannot contain "dac".  The value "none" is
# a special value; security_driver can be set to that value in
#security_driver = "selinux"
security_driver = "none"

Sadly to say that the result is already a failure again:

oneadmin@node8:~/datastores/0/16$ virsh -c qemu:///system create deployment.0
error: Failed to create domain from deployment.0
error: internal error: process exited while connecting to monitor: 2020-10-31T15:37:25.539872Z qemu-system-x86_64: -drive file=rbd:one/one-3-16-0:id=libvirt:auth_supported=cephx\;none:mon_host=10.20.0.11\:6789\;10.20.0.12\:6789\;10.20.0.13\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0: error connecting: Permission denied

and the vm log in Opennebula:

Sat Oct 31 16:49:58 2020 [Z0][VM][I]: New LCM state is BOOT
Sat Oct 31 16:49:58 2020 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/16/deployment.0
Sat Oct 31 16:50:02 2020 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Sat Oct 31 16:50:04 2020 [Z0][VMM][I]: ExitCode: 0
Sat Oct 31 16:50:04 2020 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Sat Oct 31 16:50:05 2020 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/0/16/deployment.0' 'node8-kvm' 16 node8-kvm
Sat Oct 31 16:50:05 2020 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/16/deployment.0
Sat Oct 31 16:50:05 2020 [Z0][VMM][I]: error: internal error: process exited while connecting to monitor: 2020-10-31T15:50:05.430023Z qemu-system-x86_64: -drive file=rbd:one/one-3-16-0:id=libvirt:auth_supported=cephx\;none:mon_host=10.20.0.11\:6789\;10.20.0.12\:6789\;10.20.0.13\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0: error connecting: Permission denied
Sat Oct 31 16:50:05 2020 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/0/16/deployment.0
Sat Oct 31 16:50:05 2020 [Z0][VMM][I]: ExitCode: 255
Sat Oct 31 16:50:05 2020 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Sat Oct 31 16:50:05 2020 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/0/16/deployment.0
Sat Oct 31 16:50:05 2020 [Z0][VM][I]: New LCM state is BOOT_FAILURE

Tried to install the vm on an untouched (freshly installed) node (except security_driver=“none” of course); same failure. :frowning_face:

I would love more ideas…

BR
/M

hi

i’m sure you’ve done it, but just to confirm - have you restarted libvirtd after making changes to qemu.conf? if yes, please see below

i see you use ceph.

can you test with a nfs share? you can just create one on a separate node or use an existing one if you have it.
with changes made to qemu.conf + libvirtd restarted + nfs share it should work

if the above test is successful then most likely your issue is with ceph - permissions, secret, maybe even the same apparmor issue on your ceph nodes?!

cheers,
x

Hi,

Many thanks for the reply.
Yes, I’m sure I have restarted libvirtd. (Even rebooted the machine for sure.)
For better understanding the issue, I repeated the two experiments:

1. System storage
Result: Failure
In this case, I asked to create the disk(s) in system storage pool, i.e. local to compute node.

The xml asks for a ceph image:

oneadmin@node8:~/datastores/0/19$ cat ./deployment.0
...
                <disk type='network' device='disk'>
                        <source protocol='rbd' name='one/one-3-19-0'>
...

It does not exist on ceph:

oneadmin@node8:~$ rbd ls -l --id libvirt one
NAME SIZE PARENT FMT PROT LOCK
one-3 2.20GiB 2
one-3@snap 2.20GiB 2 yes
one-3-20-0 20GiB one/one-3@snap 2
one-4 1GiB 2
one-4@snap 1GiB 2 yes
one-4-6-0 1GiB one/one-4@snap 2
one-4-7-0 20GiB one/one-4@snap 2

However, the files were created on local disk, as expected:

oneadmin@node8:~/datastores/0/19$ ls -l
total 1752456
-rw-rw-r-- 1 oneadmin oneadmin 1776 Nov 1 15:20 deployment.0
-rw-r–r-- 1 oneadmin oneadmin 21474836480 Nov 1 15:25 disk.0
-rw-r–r-- 1 oneadmin oneadmin 372736 Nov 1 15:20 disk.1

So, ‘one/one-3-19-0’ may have existed and deleted after copying to node8.
Indeed, disk.0 must be a copy of one-3 because attaching it to a VM by virt-manager GUI, that VM boots as expected:

2. Ceph data store
Result: Failure

Now, I asked to create the disk in ceph.
Indeed, the generated xml refers one:

            <disk type='network' device='disk'>
                    <source protocol='rbd' name='one/one-3-20-0'>

and the local data store doesn’t contain disk.0:

root@node8:~/datastores/102/20# ls -l
total 368
-rw-rw-r-- 1 oneadmin oneadmin   1778 Nov  1 15:20 deployment.0
-rw-r--r-- 1 oneadmin oneadmin 372736 Nov  1 15:20 disk.1

‘one/one-3-20-0’ exists in ceph (see above).
Let’s try to map it:

oneadmin@node8:~$ sudo ls -l /dev/rbd*
ls: cannot access ‘/dev/rbd*’: No such file or directory
oneadmin@node8:~$ sudo rbd map one/one-3-20-0 --id libvirt
/dev/rbd0
oneadmin@node8:~$ sudo ls -l /dev/rbd*
brw-rw---- 1 root disk 252, 0 Nov 1 15:06 /dev/rbd0
brw-rw---- 1 root disk 252, 1 Nov 1 15:06 /dev/rbd0p1
brw-rw---- 1 root disk 252, 14 Nov 1 15:06 /dev/rbd0p14
brw-rw---- 1 root disk 252, 15 Nov 1 15:06 /dev/rbd0p15

Succeded. I don’t see any problem with ceph.
(I don’t really understand your suggestion on nfs share.
Anyway, I’ve made an nfs share earlier from another ceph pool on a different machine, it works.)

Please note, that the image ‘one-4-6-0’ listed in ceph above belongs to a running lxc container.

What else can I check or adjust?

BR,
/M

Hi Mihaly,

My suggestion was that opennebula does not have the right permissions to access your ceph pool.

The test using nfs share implied that you created a standard nfs share on a node, mount it and add it as a system_datastore to see if it successfully creates a VM.

I’m pretty sure that from what you’ve posted that the issue is with opennebula accessing the ceph pool.

cheers,
x

Hi xander,

I can confirm that something is wrong with ceph acces.
I tried all combinations with image and VM data stores, here is the result:

               +--------------------+
               I   VM data store    I
  +------------+---------+----------+
  I Image      I system  I ceph     I
  I data store I         I          I
  +------------+---------+----------+
  I system     I  OK     I Hangs in I
  I            I         I Prolog   I
  +------------+---------+----------+
  I ceph       I FAIL    I FAIL     I
  +------------+---------+----------+

It is possible to fire up a VM if both the image and VM disk are in system store.
So the title is not valid anymore, would be something like:
“Can not instantiate VM when using ceph”

However, from any compute node, the ceph system can be accessed by the user “oneadmin” from command line (with --id libvirt as the install tutorial suggests).
Moreover, the admin node is able to create image in ceph and download that image to the compute node.
(Am I right? Who downloads the image, the admin or the compute node?)

Could you please suggest what to check/modify to solve this issue?
(Forgetting ceph is not an option to me :slight_smile: )

Thank you very much for your help.

BR,
/M

Good information thanks for sharing
vmware