Ceph HCI Ansible version issue

Hi all,

I’ve setup a new cluster with OpenNebula 6.8.0 on Ubuntu 22.04 and all went smoothly.

The next step was to setup Ceph HCI to use the SSDs present on each of the 3 nodes as datastore for the cluster.

I’ve followed this document to implement Ceph HCI:
https://docs.opennebula.io/6.8/provision_clusters/hci_clusters/onprem_cluster_ceph.html

I did notice that the document mentioned specifically Ubuntu 20.04 instead of 22.04 as supported by the front-end but thought it might be compatible anyway and maybe the docs needed updating.

The end result is that during the script created 3 new hosts instead of using the hosts already configured and then it stopped with the following error:

ERROR: Unsupported Ansible ver. 2.9.27, must be >= 2.12 and < 2.13

Ansible-core 2.9.27 is the version available with Ubuntu 22.04 while the ppa provides version 2.16.7.

Is there any specific showstoppers in using ansible-core 2.16.7?

I could reinstall everything using Ubuntu 20.04 but it would be great to be able to use 22.04 as it’s the latest version supported by the front-end and I’d be happy to use my cluster as a testbed for a new version of the HCI install process.

Thanks

Paolo

Internally we use ceph-ansible project, which is quite strict in terms of supported ansible verison. That’s the reason we require that specfic version.

I suggest you to install required version, such as

pip3 install ansible==5.10

Then it’s recomended to remove the prviously failed provision and try to create it again.

Also, let me remark, that at the moment oneprovision compomenent is not receiving much updates as we are preparing to refactor it in future OpenNebula release.

Lastly, Ubuntu 22.04 should work as well.

Thanks @jorel,

that sorted the issue, the script runs but another problem cropped up.

While I can successfully ssh as root on the hosts without password the script isn’t apparently capable of doing the same:

“Failed to connect to the host via ssh: Warning: Permanently added ‘onhost1’ (ED25519) to the list of known hosts.\r\nroot@onhost1: Permission denied (publickey,password).”

Is there a missing step that I should perform before running it?

Thanks

Paolo

oneadmin uses this ssh key /var/lib/one/.ssh-oneprovision/id_rsa to reach the provisioning host. Can you check if password-less ssh is working with this key?

Apologies for the delay in checking back.
Adding that key to the hosts allowed me to continue the testing.
It might be good to add that to the Ceph HCI documentation.

I made some progress but unfortunately when the script starts dealing with OSD configurations we are back to the issue with ansible versions.

At “TASK [ceph-facts : set default osd_pool_default_crush_rule fact]” the script starts complaining about “Collection ansible.utils does not support Ansible version 2.12.10” but it carries on.

Then it all grinds to a halt as it needs version 2.15:
TASK [ceph-validate : fail on unsupported ansible version] *********************
fatal: [onhost1]: FAILED! => changed=false
msg: Ansible version must be 2.15!
fatal: [onhost2]: FAILED! => changed=false
msg: Ansible version must be 2.15!
fatal: [onhost3]: FAILED! => changed=false
msg: Ansible version must be 2.15!

From the message it seems to indicate that it wants version 2.15 on the nodes so I’ve installed it on onhost1 to see if it would make a difference but it doesn’t.

Any suggestions?

Thanks

Paolo

As scripts and requirements have been updated I’ve done a clean install of OpenNebula 6.10.1.

With ansible-core 2.6.13 on OpenNebula 6.10.1 the ansible warnings and errors are gone.