Hello,
I wanted to test and play around with OpenNebula so I deployed a minione and used the new HCI Provisioner to create a 3-node HCI cluster on three other VMs (nested virtualization is enabled.)
First I ran into an issue where the /var/lib/one/.ssh folder was owned by user 9869 (which doesn’t exist), after I chowned that to oneadmin I can now deploy VMs.
I was instantiating an Alpine Linux via Sunstone, after booting its LCM state immediately changed to ERROR, the log of the VM shows the following:
Mon May 23 14:18:54 2022 [Z0][VM][I]: New state is ACTIVE
Mon May 23 14:18:54 2022 [Z0][VM][I]: New LCM state is PROLOG
Mon May 23 14:18:58 2022 [Z0][VM][I]: New LCM state is BOOT
Mon May 23 14:18:58 2022 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/2/deployment.0
Mon May 23 14:18:59 2022 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Mon May 23 14:18:59 2022 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Mon May 23 14:19:00 2022 [Z0][VMM][I]: ExitCode: 0
Mon May 23 14:19:00 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/mkdir -p.
Mon May 23 14:19:00 2022 [Z0][VMM][I]: ExitCode: 0
Mon May 23 14:19:00 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/101/2/vm.xml.
Mon May 23 14:19:00 2022 [Z0][VMM][I]: ExitCode: 0
Mon May 23 14:19:00 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/101/2/ds.xml.
Mon May 23 14:19:02 2022 [Z0][VMM][I]: ExitCode: 0
Mon May 23 14:19:02 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: deploy.
Mon May 23 14:19:02 2022 [Z0][VMM][I]: Successfully execute network driver operation: post.
Mon May 23 14:19:02 2022 [Z0][VM][I]: New LCM state is RUNNING
Mon May 23 14:19:03 2022 [Z0][VMM][I]: VM running but monitor state is ERROR.
Mon May 23 14:19:03 2022 [Z0][VM][I]: New LCM state is UNKNOWN
Mon May 23 14:19:18 2022 [Z0][VMM][I]: VM running but monitor state is ERROR.
Mon May 23 14:22:22 2022 [Z0][VMM][I]: VM running but monitor state is ERROR.
On the hypervisor host the syslog shows this:
May 23 14:43:17 lha-edge03 kernel: [ 4331.193669] audit: type=1400 audit(1653316997.636:68): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvirt-635a298a-1c9f-4f43-a61f-1fda339876ff" pid=37596 comm="apparmor_parser"
May 23 14:43:38 lha-edge03 kernel: [ 4352.095442] audit: type=1400 audit(1653317018.536:69): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-747934b6-0dee-4b80-b5ce-77921f20611b" pid=37975 comm="apparmor_parser"
May 23 14:43:38 lha-edge03 kernel: [ 4352.351566] audit: type=1400 audit(1653317018.792:70): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-747934b6-0dee-4b80-b5ce-77921f20611b" pid=37978 comm="apparmor_parser"
May 23 14:43:38 lha-edge03 kernel: [ 4352.545797] audit: type=1400 audit(1653317018.988:71): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-747934b6-0dee-4b80-b5ce-77921f20611b" pid=37982 comm="apparmor_parser"
May 23 14:43:39 lha-edge03 kernel: [ 4352.738264] audit: type=1400 audit(1653317019.180:72): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-747934b6-0dee-4b80-b5ce-77921f20611b" pid=37986 comm="apparmor_parser"
May 23 14:43:39 lha-edge03 kernel: [ 4353.531455] audit: type=1400 audit(1653317019.972:73): apparmor="DENIED" operation="open" profile="libvirt-747934b6-0dee-4b80-b5ce-77921f20611b" name="/etc/ceph/ceph.client.oneadmin.keyring" pid=37989 comm="qemu-kvm-one" requested_mask="r" denied_mask="r" fsuid=1001 ouid=0
May 23 14:43:39 lha-edge03 kernel: [ 4353.531835] audit: type=1400 audit(1653317019.972:74): apparmor="DENIED" operation="open" profile="libvirt-747934b6-0dee-4b80-b5ce-77921f20611b" name="/etc/ceph/ceph.client.oneadmin.keyring" pid=37989 comm="qemu-kvm-one" requested_mask="r" denied_mask="r" fsuid=1001 ouid=0
May 23 14:43:39 lha-edge03 kernel: [ 4353.534304] audit: type=1400 audit(1653317019.976:75): apparmor="DENIED" operation="open" profile="libvirt-747934b6-0dee-4b80-b5ce-77921f20611b" name="/etc/ceph/ceph.client.oneadmin.keyring" pid=37989 comm="qemu-kvm-one" requested_mask="r" denied_mask="r" fsuid=1001 ouid=0
May 23 14:43:39 lha-edge03 kernel: [ 4353.534308] audit: type=1400 audit(1653317019.976:76): apparmor="DENIED" operation="open" profile="libvirt-747934b6-0dee-4b80-b5ce-77921f20611b" name="/etc/ceph/ceph.client.oneadmin.keyring" pid=37989 comm="qemu-kvm-one" requested_mask="r" denied_mask="r" fsuid=1001 ouid=0
May 23 14:43:39 lha-edge03 kernel: [ 4353.539438] audit: type=1400 audit(1653317019.980:77): apparmor="DENIED" operation="open" profile="libvirt-747934b6-0dee-4b80-b5ce-77921f20611b" name="/etc/ceph/ceph.client.oneadmin.keyring" pid=37989 comm="qemu-kvm-one" requested_mask="r" denied_mask="r" fsuid=1001 ouid=0
May 23 14:43:39 lha-edge03 kernel: [ 4353.539565] audit: type=1400 audit(1653317019.980:78): apparmor="DENIED" operation="open" profile="libvirt-747934b6-0dee-4b80-b5ce-77921f20611b" name="/etc/ceph/ceph.client.oneadmin.keyring" pid=37989 comm="qemu-kvm-one" requested_mask="r" denied_mask="r" fsuid=1001 ouid=0
The Cluster was deployed on three freshly installed and up-to-date Ubuntu focal VMs using /usr/share/one/oneprovision/edge-clusters/metal/provisions/onprem-hci.yml.
Disabling the apparmor for libvirt fixed the issue for me, the VM is now correctly starting.
EDIT: It did not solve the issue for me, VM still stuck in ERROR.
Where can I look next?