High cpu usage after a while since 5.8.0

madko · March 26, 2019, 8:54am

Since we have upgraded from 5.6.2 to 5.8.0 we are expériencing some high cpu usage on oned threads. After approx 24h we have 2 threads stuck at 100%. And after 48h we have even more threads stuck at 100%.
If I do some strace on those threads, I can see that they connect to RPC/XML port (I guess), I can see some HTTP headers about that, and then a lot of connection timeout.

If we restart opennebula, we are fine for approx 24h.

I’ve change this two keys in oned.conf concerning the timeout, KEEPALIVE_TIMEOUT and TIMEOUT. So we have this now :

MAX_CONN           = 240
MAX_CONN_BACKLOG   = 480
#KEEPALIVE_TIMEOUT  = 15
KEEPALIVE_TIMEOUT  = 30
#KEEPALIVE_MAX_CONN = 30
#TIMEOUT            = 15
TIMEOUT            = 30

I don’t know yet if it’s fine. Is that a good idea ?

Next I will try a onedb purge to remove old done VMs, and also to clean long history.

In oned.log I can only see some slow queries detected, mostly about replacing some value in vm_pool. I don’t know if it’s related… Nothing about connection timeout tho.

Any other lead I could follow ?

Best regards,
Edouard

Versions of the related components and OS (frontend, hypervisors, VMs):

OpenNebula 5.8.0 on CentOS 7
1681 VMs

Steps to reproduce:

It was fine with 5.6.2. Then upgrade to 5.8.0 make this problem happens

Current results:

oned threads stuck at 100% CPU

Expected results:

No threads stuck at 100%

ruben · March 26, 2019, 9:23am

Can you send the output of the following command when one of the oned threads is at 100%:

sudo gdb -q -ex 'thread apply all bt' -ex 'detach' -ex 'quit' `which oned` \ `pgrep oned` > oned.trace

You can PM the file.

alvaro_simongarcia · June 26, 2020, 8:58am

hi @ruben

We also have this issue now in our production machine after the centos 7.8 upgrade.
Our idea is to upgrade also opennebula to 5.12 as soon as possible in production, but we do not know it there is any workaround that we can apply meanwhile.

For now just an opennebula service restart fixes the issue for a while.

Cheers
Álvaro

Topic		Replies	Views
What's wrong? maybe a bug? Product Support	7	715	April 26, 2017
XML-RPC requests timeout Product Support	6	32	May 16, 2025
Does anybody see the "CPU stuck for <large number>s" guest errors? Product Support	2	930	June 25, 2025
Can't run VM after Opennebula upgrade. Need URGENT help! Product Support	9	1642	April 11, 2015
Opennebula scheduler keeps dying, one4.8 Product Support	4	716	May 18, 2016

High cpu usage after a while since 5.8.0

Related topics