Upgrade from ONE 4.14.2 to 5.(0.2) strongly increases amount of MySQL operations

After the upgrade to ONE 5(.0.2) we see an enormous increase in MySQL InnoDB activity. Disk utilisation on frontend (with mysql primary master) increased at least 30%. Also the amount of “questions / replace” operations increased tremendously. Same goes for InnoDB Buffer Pool activity (pages written). For some reason (no relevant changes to oned.conf) there are a lot more reads/inserts then before. What could be the reason for this? MySQL was upgraded from 5.5 to 5.7 (sync-binlog=1 was already configured in MySQL 5.5). See also the following graphs:

Hi Stephan,

The only new thing that comes to my mind that may impact on replaces is the MarketPlace. However this does not match with your numbers (apps are only updated if version/md5 changes) and unless you have huge markets it does not explain your monitor numbers.

Additionally, the monitor of Datastores (shared system datastores) has been moved from the host monitor so there should be less operations.

So the only thing that comes to my mind is a change in the workload of your cluster (number of VMs) or maybe there are “old” monitor daemons running in the cluster, so you are getting the information from each host multiple times. Could you check the latter?

We are interested on looking at this, so any other debugging information you have will be more than welcome (maybe we can have a snapshot of the DB operations in a couple of minutes to check if there is something obvious)

Thanks for sharing :slight_smile:

So the only thing that comes to my mind is a change in the workload of your cluster (number of VMs) or maybe there are “old” monitor daemons running in the cluster, so you are getting the information from each host multiple times. Could you check the latter?

That’s not the case. Every 20 seconds the info is pushed to the frontend. No extra processes on the hosts or whatsover.

Do you want to have a list of executed queries of the database in a certain time period?

I’ve compared the frequency of monitoring information (vm_monitoring table) for a VM being polled in ONE 4.14.2 to the same VM in ONE 5.0.2.

ONE 4.14.2:
~ 320 measurements in 4 hours (14400 seconds)

or roughly a measurement every 45 seconds.

ONE 5.0.2
~ 1030 measurement in 4 hours (14400 seconds)

or roughly a measurement every 14 seconds.

The polling frequency (IM_MAD) / MONITORING_INTERVAL hasn’t changed during the upgrade. I see that the polling information for DISK_SIZE is monitored approximately 5 seconds later than STATE info. That explains the extra entries I see in ONE 5.0.2. On average every 28 seconds for a STATE and DISK_SIZE monitoring event to be recorded. A tcpdump shows opennebula front-end gets collectd info roughly every 20 seconds. IM_MAD has a 5 seconds flush interval so this all might make sense. So it seems that ONE 5.0.2 is actually closer to 20 second monitoring interval than ONE 4.14.2.

Has the monitoring approach for STATE / DISK_SIZE changed with ONE 5.0.2?

Hi

Yes, shared DS (e.g. Ceph/NFS) are now monitored once from the front-end
(or storage bridges configured for the datastore). So yes there are two
monitoring paths the host for the VM and the Datastore monitor for DISK.

We have introduced this change because of the performance penalty of
multiple monitoring requests from cluster hosts for each VM. Note also that
the information retrieved from the storage system is the same when its
shared.

DISK_SIZE is used for information purposes. I am thinking of disable the DB
syncs (and wait for the host monitor). In case of failure (between the two
monitor messages) we will loose DISK_SIZE updates that are pretty static
(in a 30s timeframe).

What do you think?

Yeah, we use DISK_SIZE to measure the size of the snapshots (SNAPSHOT_SIZE). But it would be perfectly fine if we would loose 30 seconds of data. That’s really no issue.

Hi,

Could you try the patch below. It implements the proposed idea, let’s see if it reduces DB I/O.

0001-Do-not-update-disk-monitor-infor-in-the-DB.-Disk-usa.patch (4.7 KB)

THANKS!!!

I’ve applied the patches and it’s live in two of our clouds for testing. I’ll let you know ASAP. Thanks!

Cool, thanks!

They are definitely helping, we’re keepin’ those patches ;). I guess you spot the moment when they became active.

Graph of DISK IO:

InnoDB Row operations:

InnoDB IO:

Awesome! The patch is in master and one-5.0 branches.