Opennebula scheduler keeps dying, one4.8

timm · May 17, 2016, 1:41pm

I have an OpenNebula 4.8 installation that has been stable since June of 2015.
Over the last few months I have seen that the /usr/bin/mm_sched tends to hang. The symptoms
are that after a certain time there are no entries in the sched.log and new vms that are submitted
are just stuck in pending state. There are no core dumps and the mm_sched daemon keeps running, it
just does not do anything until I restart the opennebula service, which restarts oned and the mm_sched.

Has anyone seen this behavior before? I thought this could be due to large numbers of vms
being launched by the econe-server interface but I have also seen it happen when there was no
big load on the system. Also, it tends to happen overnight.

ruben · May 17, 2016, 2:13pm

Probably you are being hit by this one:

http://dev.opennebula.org/issues/3390

which also has its “overnight” version http://dev.opennebula.org/issues/4284

It seems related to an issue with the xmlrpc client, we have re-written the
client logic to use the advance functionality to prevent this bug. This is
not an easy back port so I am not sure if it will be in 4.x branch in the
short-term.

Meanwhile, note that the scheduler is totally stateless and you can restart
the process in a cron-like job.

timm · May 17, 2016, 2:37pm

Hi Ruben
Yes, I checked my oned logs and I see the same key message

Tue May 17 03:30:55 2016 [Z0][InM][E]: Information driver crashed, recovering…

at about the time the mm_sched stopped functioning. so I think it is fair to say that we are dealing with exactly the same issue here. Thanks for the explanation. when do we expect the 5.0 beta to be ready?

Steve Timm

ruben · May 17, 2016, 2:38pm

hopefully tomorrow

timm · May 18, 2016, 3:15pm

One other thing on this: I back-checked my logs for "Information driver crashed, recovering"
and found that in almost all cases it was happening at the bottom of the hour, at the the time when
my hourly mysqldb backup is happening, and it is not uncommon for various operations to take a little while during that tuime.

Topic		Replies	Views
Opennebula-scheduler wedges up Community Support	1	505	February 26, 2016
Scheduler debugging Community Support	0	438	December 11, 2016
VM scheduler no longer works after database recovery General	4	461	June 14, 2022
What's wrong? maybe a bug? Community Support	7	713	April 26, 2017
High cpu usage after a while since 5.8.0 Community Support	2	458	June 26, 2020

Opennebula scheduler keeps dying, one4.8

Related topics