Explain monitoring a shared datastore

Hi,

I’ve a question and I did not find anything about it in the docs:

When I use shared storage (NFS and Ceph) how is capacity monitoring working? As much as I know the probes are executed on every host. Meaning for example that 50 hosts will monitor 30 datastores hosted on a single NFS Server or Ceph Cluster and by that executing 1500 du or rbd ls on that storage (more or less at the same time)?

I know there is some need for redundancy, but this can lead to quite some performance impact.

Just a thought, opinions welcome …

Thx, Armin

This has been fixed for next version, see here:

http://dev.opennebula.org/issues/4138

Cheers

Hi,
I can’t find the feature branch to check sources but IMO beside SYSTEM_DS monitoring the disk polling has same if not worse issue as it is called for each attached disk in a VM… (well it is cached per pool, but again there is a call on each node)

See vmm_mad/remotes/poll_xen_kvm.rb:
rbd_pool() defined in line 553
used in get_disk_usage() line 346

In our addon we are solving the issue by a cron job at the front-end which is querying the storage API once and pushing to the nodes a ‘cache’ file from which we are feeding the various stats monitoring/reporting scripts. Yes, this way the reported stats are lagged a little but it is not stressing the stats collector and it is allowing us to isolate the need of access to the management api of the storage from the HV nodes (in case a rogue manage to escape the VM cage it will compromise only the current node not the entire storage).

Kind Regards
Anton Todorov

Thx Ruben!