Ceph DS: XPath set is empty

Hi everyone!

I have fresh installed 6.8 version and Quincy Ceph cluster.
For testing purposes I run Ceph services on the same hosts as Opennebula.

Configured datastore as described in docs, following every line.

Finally I could create datastore (and it has Monitored status, shows me space), could download image from marketplace.
But after faced with problems with VM. Start VM from this template, it works for 5 min (status RUNNING) and then fails into POWEROFF state with logs:

Mon Apr 29 20:51:44 2024 [Z0][VM][I]: New state is ACTIVE
Mon Apr 29 20:51:44 2024 [Z0][VM][I]: New LCM state is PROLOG
Mon Apr 29 20:51:46 2024 [Z0][VM][I]: New LCM state is BOOT
Mon Apr 29 20:51:46 2024 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/38/deployment.0
Mon Apr 29 20:51:47 2024 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Mon Apr 29 20:51:47 2024 [Z0][VMM][I]: ExitCode: 0
Mon Apr 29 20:51:47 2024 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Mon Apr 29 20:51:47 2024 [Z0][VMM][I]: ExitCode: 0
Mon Apr 29 20:51:47 2024 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/mkdir -p.
Mon Apr 29 20:51:47 2024 [Z0][VMM][I]: ExitCode: 0
Mon Apr 29 20:51:47 2024 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/121/38/vm.xml.
Mon Apr 29 20:51:47 2024 [Z0][VMM][I]: ExitCode: 0
Mon Apr 29 20:51:47 2024 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/121/38/ds.xml.
Mon Apr 29 20:51:50 2024 [Z0][LCM][I]: VM reported RUNNING by the drivers
Mon Apr 29 20:51:50 2024 [Z0][VM][I]: New LCM state is RUNNING
Mon Apr 29 20:56:48 2024 [Z0][VMM][I]: Command execution fail (exit code: 255): cat << 'EOT' | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/121/38/deployment.0' 'onenode-prg-03' 38 onenode-prg-03
Mon Apr 29 20:56:48 2024 [Z0][VMM][I]: XPath set is empty
Mon Apr 29 20:56:48 2024 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/121/38/deployment.0
Mon Apr 29 20:56:48 2024 [Z0][VMM][I]: error: internal error: process exited while connecting to monitor: 2024-04-29T20:56:48.469573Z qemu-kvm-one: -blockdev {"driver":"rbd","pool":"one","image":"one-21-38-0","server":[{"host":"onenode-prg-01","port":"6789"},{"host":"onenode-prg-02","port":"6789"},{"host":"onenode-prg-03","port":"6789"}],"user":"libvirt","auth-client-required":["cephx","none"],"key-secret":"libvirt-2-storage-auth-secret0","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}: error connecting: Connection timed out
Mon Apr 29 20:56:48 2024 [Z0][VMM][I]: Could not create domain from /var/lib/one//datastores/121/38/deployment.0
Mon Apr 29 20:56:48 2024 [Z0][VMM][I]: ExitCode: 255
Mon Apr 29 20:56:48 2024 [Z0][VMM][I]: ExitCode: 0
Mon Apr 29 20:56:48 2024 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Mon Apr 29 20:56:48 2024 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Mon Apr 29 20:56:48 2024 [Z0][VMM][E]: DEPLOY: XPath set is empty error: Failed to create domain from /var/lib/one//datastores/121/38/deployment.0 error: internal error: process exited while connecting to monitor: 2024-04-29T20:56:48.469573Z qemu-kvm-one: -blockdev {"driver":"rbd","pool":"one","image":"one-21-38-0","server":[{"host":"onenode-prg-01","port":"6789"},{"host":"onenode-prg-02","port":"6789"},{"host":"onenode-prg-03","port":"6789"}],"user":"libvirt","auth-client-required":["cephx","none"],"key-secret":"libvirt-2-storage-auth-secret0","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}: error connecting: Connection timed out Could not create domain from /var/lib/one//datastores/121/38/deployment.0 ExitCode: 255
Mon Apr 29 20:56:48 2024 [Z0][LCM][E]: deploy_failure_action, VM in a wrong state
Mon Apr 29 20:57:17 2024 [Z0][LCM][I]: VM running but monitor state is POWEROFF

Opennebula configured properly, I can connect from frontend to node and check libvirt:

oneadmin@hosting-cntl:/root$ ssh onenode-prg-03 virsh -c qemu:///system list --all
 Id   Name     State
-----------------------
 3    one-38   paused

On the node in libvirt logs I see

2024-04-29 21:11:18.086+0000: starting up libvirt version: 7.0.0, package: 3+deb11u2 (Guido Günther <agx@sigxcpu.org> Mon, 06 Feb 2023 17:50:14 +0100), qemu version: 5.2.0Debian 1:5.2+dfsg-11+deb11u3, kernel: 5.10.0-28-amd64, hostname: onenode-prg-03
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
HOME=/var/lib/libvirt/qemu/domain-2-one-38 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-2-one-38/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-2-one-38/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-2-one-38/.config \
QEMU_AUDIO_DRV=none \
/usr/bin/qemu-kvm-one \
-name guest=one-38,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-one-38/master-key.aes \
-machine pc-i440fx-5.2,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \
-cpu qemu64 \
-m 768 \
-object memory-backend-ram,id=pc.ram,size=805306368 \
-overcommit mem-lock=off \
-smp 1,sockets=1,cores=1,threads=1 \
-uuid 37946566-66f0-47da-a606-478b60d31d17 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=34,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-boot strict=on \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-device virtio-scsi-pci,id=scsi0,num_queues=1,bus=pci.0,addr=0x4 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
-object secret,id=libvirt-2-storage-auth-secret0,data=[*MASKED*],keyid=masterKey0,iv=[*MASKED*],format=base64 \
-blockdev '{"driver":"rbd","pool":"one","image":"one-21-38-0","server":[{"host":"onenode-prg-01","port":"6789"},{"host":"onenode-prg-02","port":"6789"},{"host":"onenode-prg-03","port":"6789"}],"user":"libvirt","auth-client-required":["cephx","none"],"key-secret":"libvirt-2-storage-auth-secret0","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \
-device virtio-blk-pci,bus=pci.0,addr=0x6,drive=libvirt-2-format,id=virtio-disk0,bootindex=1,write-cache=on \
-blockdev '{"driver":"file","filename":"/var/lib/one//datastores/121/38/disk.1","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":true,"driver":"raw","file":"libvirt-1-storage"}' \
-device ide-cd,bus=ide.0,unit=0,drive=libvirt-1-format,id=ide0-0-0 \
-netdev tap,fd=36,id=hostnet0,vhost=on,vhostfd=37 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=02:00:0a:24:fa:01,bus=pci.0,addr=0x3 \
-chardev socket,id=charchannel0,fd=38,server,nowait \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-vnc 0.0.0.0:38 \
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2024-04-29T21:16:18.214503Z qemu-kvm-one: -blockdev {"driver":"rbd","pool":"one","image":"one-21-38-0","server":[{"host":"onenode-prg-01","port":"6789"},{"host":"onenode-prg-02","port":"6789"},{"host":"onenode-prg-03","port":"6789"}],"user":"libvirt","auth-client-required":["cephx","none"],"key-secret":"libvirt-2-storage-auth-secret0","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}: error connecting: Connection timed out
2024-04-29 21:16:18.311+0000: shutting down, reason=failed

In the same time I can connect to ceph using my libvirt user:

oneadmin@onenode-prg-03:~$ rbd ls -p one --id libvirt
one-21
one-21-38-0

Versions of the related components and OS (frontend, hypervisors, VMs): Debian 11, Opennebula v6.8.0

Steps to reproduce:

  • Install fresh Debian 11
  • Install Opennebula using official documentation
  • Deploy Ceph Quincy using Cephadm and official docs
  • Configure Opennebula’s datastore using official docs

Current results: VM cannot start, fails in POWEROFF

Expected results: VM is running without fails

Hi @krakazyabra :wave: :

Welcome to the OpenNebula Forum! :rocket: Nice to see that you’re testing with OpenNebula.

Thank you for the full detailed description of the problem, it helps a lot :slight_smile: . Could you check the Ceph logs for errors or warnings when OpenNebula tries to connect or access the image?

Best,
Victor.

Hi @vpalma
Thanks for reply.

I’ve tried to catch ceph’s log but either log_level 3 is not enough verbose or there are no warrings/errors.

May be you can point me to the right place to find more info about it?

Moreover, reinstalled OS on the node (now there is Ubuntu 22.04), disabled apparmor. Problem still there.

some logs from journalctl -u libvirtd

Apr 30 12:27:19 onenode-prg-03 libvirtd[111415]: libvirt version: 8.0.0, package: 1ubuntu7.10 (Marc Deslauriers <marc.deslauriers@ubuntu.com> Fri, 12 Apr 2024 13:48:21 -0400)
Apr 30 12:27:19 onenode-prg-03 libvirtd[111415]: hostname: onenode-prg-03
Apr 30 12:27:19 onenode-prg-03 libvirtd[111415]: Unable to read from monitor: Connection reset by peer
Apr 30 12:27:19 onenode-prg-03 libvirtd[111415]: internal error: qemu unexpectedly closed the monitor: 2024-04-30T12:27:19.010731Z qemu-kvm-one: -blockdev {"driver":"rbd","pool":"one","image":"one-21-38-0","server":[{"host":"onenode-prg-01","port":"6789"},{"host":"onenode-prg-02","port":"6789"},{"host":"onenode-prg-03","port":"6789"}],"user":"libvirt","auth-client-required":["cephx","none"],"key-secret":"libvirt-2-storage-auth-secret0","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}: error connecting: Connection timed out
Apr 30 12:27:19 onenode-prg-03 libvirtd[111415]: internal error: process exited while connecting to monitor: 2024-04-30T12:27:19.010731Z qemu-kvm-one: -blockdev {"driver":"rbd","pool":"one","image":"one-21-38-0","server":[{"host":"onenode-prg-01","port":"6789"},{"host":"onenode-prg-02","port":"6789"},{"host":"onenode-prg-03","port":"6789"}],"user":"libvirt","auth-client-required":["cephx","none"],"key-secret":"libvirt-2-storage-auth-secret0","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}: error connecting: Connection timed out

Checked deployment at /var/lib/one/datastores/121/38/deployment.x

                        <auth username='libvirt'>
                                <secret type='ceph' uuid='a61356c2-6312-42e2-b11b-bafd73343dd3'/>
                        </auth>

It is same as

root@onenode-prg-03:~# virsh secret-list
 UUID                                   Usage
--------------------------------------------------------------------
 a61356c2-6312-42e2-b11b-bafd73343dd3   ceph client.libvirt secret

digging deeper

root@onenode-prg-03:~# telnet onenode-prg-01 6789
Trying 10.250.252.10...
telnet: Unable to connect to remote host: Connection refused
root@onenode-prg-03:~# telnet onenode-prg-01 3300
Trying 10.250.252.10...
telnet: Unable to connect to remote host: Connection refused

Here was the problem with networks: ceph uses 10.249.252.0 network. Ok, changed CEPH_HOST variable in datastore to IP addresses

root@onenode-prg-03:~# telnet 10.249.252.10 6789
Trying 10.249.252.10...
Connected to 10.249.252.10.
Escape character is '^]'.
ceph v027�

redeployed VM and finally

root@onenode-prg-03:~# virsh -c qemu:///system list --all
 Id   Name     State
------------------------
 4    one-39   running

this problem also related to VM creation failed. XPath empty - #3 by o.mbarek

Hi @krakazyabra glad to see that the problem is solved.

Indeed it looks like there was a problem with the connectivity to the Ceph cluster. Sorry I couldn’t be of more help, you were able to fix it on your own in the end, but glad you finally got your VM up and running :smiley:

Best,
Victor.