Hyper-converged OpenNebula solution

Hello.

We are looking at a new OpenNebula cluster and avoid the SAN problem we described in OpenNebulaConf 2016.

We think about a lizardfs (thanks nodeweaver :wink:) but wonder a little what kind of setup to use:

  • Today, we have 4TB qcow2 storage (25TB uncompressed according to OpenNebula) on SAN and we would like to do some kind of hot/warm/cold storage to limit the price, is it reasonable to do?
  • It looks like itā€™s better to have more chunkservers than having huge capacity per server?
  • Is it ok to put master (and shadow master) on chuckservers/hypervisors or is it required to have dedicated servers?

Does someone have some hints to provide?

Regards.

Daniel Dehennin via OpenNebula Community opennebula@discoursemail.com
writes:

Hello.

Hello.

We are looking at a new OpenNebula cluster and avoid the SAN problem we described in OpenNebulaConf 2016.

We think about a lizardfs (thanks nodeweaver :wink:) but wonder a little what kind of setup to use:

  • Today, we have 4TB qcow2 storage (25TB uncompressed according to OpenNebula) on SAN and we would like to do some kind of hot/warm/cold storage to limit the price, is it reasonable to do?

It looks like I need to have dedicated chunkserver with proper label
(like ā€œcoldā€ or ā€œslowā€) and ā€œattachā€ some datastores to that label using
goals[1].

I made some tests to run more than one chunkserver on a physical
machine[2] with success so I could have 1 chunkserver for SSD and 1
chunkserver for HDD per physical machine to avoid having dedicated
hardware for spinning disks.

  • It looks like itā€™s better to have more chunkservers than having huge capacity per server?

As far as I understand, itā€™s better to avoid any kind of RAID (even
JBOD) since a chunkserver using multiple disks will strip chunks on them
like RAID0.

So, I need several disks on a single machine to make it handle more I/O,
and I need several physical machines to handle redudancy.

  • Is it ok to put master (and shadow master) on chuckservers/hypervisors or is it required to have dedicated servers?

As the physical machines will be used as hypervisors, they will have
plenty of CPU and RAM, I saw a recommandation of 64GB RAM for the
metadata server but it seems to be for a million files which sum up to
PB of data.

Does someone have some hints to provide?

Iā€™m planning to write some documentation on Lizardfs setup with
OpenNebula.

Iā€™m mostly interested by the ā€œall in oneā€ use case, a single hypervisor
with 2 disks or more to start and then extend the cluster by adding more
physical servers)

Regards.

Footnotes:
[1] https://docs.lizardfs.com/adminguide/replication.html

[2] Running multiple chunkservers on the same machine. ā€“ LizardFS

Hello.

I made some experiments, I put /var/lib/one in lizardfs and
/var/lib/one/datastores is exported only for hypervisors.

I have a problem when mounting the lizardfs on boot, I tried several
things:

  • declare an entry in /etc/fstab using the mfsdelayinit option

  • declare a systemd.mount with custom dependencies to start after
    network and lizardfs-master but before opennebula

Nothing works.

If someone is using lizardfs, could you tell me how you configured it?

Regards.

I am using this in fstab:
mfsmount /lizardfs fuse big_writes,nosuid,nodev,noatime,mfsdelayedinit,_netdev 0 0

_netdev is crucial there to mount AFTER network is available.

Hi Daniel!

  1. You can certainly use Lizardfs for this; use labels - in Nodeweaver we use HDD and SDD to mark two different chunkservers on the same physical node; create two custom goals with
    HDD_2 = HDD HDD
    SSD_2 = SSD SSD
    (or use ā€œfastā€ and ā€œslowā€ - itā€™s the same!) so you can have two copies on the same media. You can use mixed goals as well, but if you have few nodes it makes really little difference.
  2. more chunkservers vs. bigger nodes: it really depends on the workload. More chunkservers means more IOPS if you have lots of parallel operations; bigger servers means bigger cachesā€¦ If your workload can fit in a memory cache, use larger servers. For most workloads that we observe in our customersā€™ systems, more chunkservers is better, but your mileage may vary. If you post a bit of the kind of work you do I will be happy to guide you.
  3. yes in most conditions. There are a few possible issues related to locking/contention when you put all together; we solve it with a custom architecture and a few tricks with priority (changing the nice values of the servers depending on load). Unless you overload everything, you can do it with no issues.

Thanks a lot.

We now have our hardware composed of 5 servers, each composed of:

  • 72 cores XeonĀ® Gold 5220 CPU @ 2.20GHz
  • 396GB of RAM
  • 10 * 800GB SSD SAS 12GB/s

I made some preliminary tests, reported an issue on cgiserv when looking why Lizardfs is so slow :frowning:

I installed tuned and activated network-throughput profile and things are getting a little bit better, so it looks like we need to tweak a little our setup, we are not used to such performant hardware :wink:

Iā€™ll report any progress in this thread (and maybe write a blog post) for other people.