Hi there,
my team and I are currently evaluating OpenNebula Community Stable / 6.6 for a variety of deployments.
At the moment, the component we look into the most is the possibility to provision on-premise edge and HCI clusters.
Unfortunately, we are experiencing not few pitfalls and things that could have been avoided, but we hope to continue our testing for some time.
That said, our team came up with some questions regarding certain matters around handling an on-premises HCI Cluster.
A) Ceph and its’ dashboard
One component that eases administration of ceph, especially for our staff still familiarising themselves with our projects, is the dashboard module, and we have made many good experiences in past deployments of ceph without ONE. We would like to also put it into operation in this case. Is there anything that would speak against this? How is the routine administration of ceph done with ONE? I find that Oneprovision is lacking some vital functionality here, and while we can and do use the more traditional methods all the time, it would be nice to have the dashboard with us.
B) Which Ceph Orchestrator module should we use? Should we use one?
We have been looking into the classic cephadm for this, almost everytime Oneprovision runs leading to us having to adopt the cluster - we would like to avoid that, if possible. Also, we would like to operate several pools of special storage tiers with custom hardware, tags and CRUSH rules. The “one” pool Oneprovision creates does not seem to suffice to our needs. Also, considering the dashboard topic, it would be useless (as in only to monitor, and this can be done other ways) without an orchestrator. Is there a recommendation on this?
C) How would a team of administrators handle changes with Oneprovision?
E. g. assuming we would like to expand 3 HCI nodes to 5 HCI nodes, how would that be done? How are regular on-premises maintenance and scaling operations done using oneprovision (either Web-GUI or CLI-way)? Or is there another way? What should we do and what should we not? Maybe we overlooked it, but I find the documentation a bit lacking on this.
If anyone could contribute to these questions it would put some clarity into our minds. Thank you for your answers.