Hi,
I have configured OpenNebula sunstone with MySQL backend and running HA with 3 servers. Sometimes in Sunstone server log, I am getting the error “Slow query (3.53s) detected: DELETE FROM logdb WHERE timestamp > 0 AND log_index >= 0 AND fed_index = -1 AND log_index < ( SELECT MIN(i.log_index) FROM ( SELECT log_index FROM logdb WHERE fed_index = -1 AND timestamp > 0 AND log_index >= 0 ORDER BY log_index DESC LIMIT 500000 ) AS i) LIMIT 100000”. Also also getting that running HA service migration master Virtual IP frequently.
As the log shows, you are dealing with a slow query. This means that your database is not performing as expected, and it might be due to a slow storage backend or lack of resources.
is there any recommendation of MySQL to run with OpenNebula?? Currently our storage/db cluster are working properly for another services. In my understating, normally services are working properly but whilst database size are increasing, problem are starting.
If you have any recommendation, please share to apply to check.
Recommendations for OpenNebula’s database are not much different from other databases. You can tweak a lot of parameters but you will experience the biggest improvements by increasing the access speed to your data. By default mariadb only uses 128MB for innodb_buffer_pool_size. OpenNebula’s database is not big so you can easily fit it entirely inside RAM by tuning this value. If you are sharing the server with other databases, you will also have to take them into account.
Slow query (1.65s) detected: DELETE FROM logdb WHERE timestamp > 0 AND log_index >= 0 AND fed_index = -1 AND log_index < ( SELECT MIN(i.log_index) FROM ( SELECT log_index FROM logdb WHERE fed_index = -1 AND timestamp > 0 AND log_index >= 0 ORDER BY log_index DESC LIMIT 500000 ) AS i) LIMIT 100000
But I have innodb_buffer_pool set to 2G. When the query runs, it uses mostly CPU, not disk IO.
an explain, show the following:
We’re facing similar issue with same symptoms.
However, in our case it seems obvious that it’s swapping issue.
oned is using almost 20GiB of RAM which is a bit surprising.
RAM usage spiked in the last week of 2018 and is very high since (all 8GiB of RAM + 12GiB of swap).
There were no changes or updates (other than security kernel update) on the frontend.
(By the way, for sake of completeness, we did test with previous kernel and it is the same).
Our deployment didn’t increase as much in size as RAM usage would suggest.
As soon as frontend is booted, we see couple of hundreds of megabytes used per second.
When RAM is fully used, we start seeing slow queries (which makes sense).
However, it is unclear why would frontend use that much ram.
We have 570 VMs and 33 hosts.
Already tried increasing innodb_buffer_pool size, but that doesn’t affect anything as expected.
Thanks guys, we are already improving the performance of the query, as it seems that the nested select in the delete command is not using the existing indexes.
As for the memory usage I guess you 20G could be the vitrual size, you can get the resident size with, for example:
Please take into account amount of RAM and swap, before and after stopping OpenNebula service.
Difference is roughly at 15GiB for that specific output.
I agree that oned is using 6GiB of RAM, but imho, that’s just because the server has no more. If it had 32GiB, I’m pretty sure it would shoot up higher.
Any troubleshooting help would be highly appreciated as we’re starting to face issues with automated tasks since ONE can become rather slow (due to paging).
We are having the same issue at the moment with our OpenNebula cluster (not the memory issue, but the original post).
We get a Slow Query error every 2 minutes:
Wed Mar 27 12:21:13 2019 [Z0][SQL][W]: Slow query (15.05s) detected: DELETE FROM logdb WHERE timestamp > 0 AND log_index >= 0 AND fed_index = -1 AND log_index < ( SELECT MIN(i.log_index) FROM ( SELECT log_index FROM logdb WHERE fed_index
= -1 AND timestamp > 0 AND log_index >= 0 ORDER BY log_index DESC LIMIT 100000 ) AS i) LIMIT 100000
and multiple times a day our raft cluster is failing over to one of the followers. We tried upping the timeout, after receiving that recommendation from support, but to no avail.
Mosharaf, Jean, are you still having this issue? Have you found the root cause or at least a fix?
We tweaked the database settings so the database is all in ram now and the query is already much faster. We don’t get leadership elections constantly now so the worst is averted. We’ll upgrade to 5.8 after some migrations we are busy with.