Blocker bug in scheduler since 5.10

tosaraja · May 28, 2025, 6:26am

We stumbled upon a blocking scheduler bug in 6.8 and 6.10 and found that this is actually pretty old. It’s probably been around since 5.10.

How to reproduce the bug:

Create 2 clusters A and B.
Assign a host for each cluster that has for example 16 cores.
Create a VM, let’s call it VM(a) with 20 cores, and assign it to cluster A.

This VM naturally won’t ever fit in there, but this is just to mimic a situation where a cluster has other VMs filling it so that the next one (VM(a)) won’t fit.

Create another VM, let’s call it VM(b), with 4 cores and assign it to cluster B.

The scheduler should pick it up and assign to cluster B, where it would fit just fine.

The bug: It doesn’t.
Debug shows: Host 2 discarded for VM 9054. Cannot allocate NUMA topology

Reason: The loop in Scheduler.cc doesn’t produce a clean “HostShareCapacity” struct each time it runs vm->get_capacity(sr). Alas it gets NUMA information from VM(a) when going through the data for VM(b).

FrancJP · May 30, 2025, 1:24pm

Hello @tosaraja,

I think that the best thing to do here is to report it on the Repo (One Issues).

If the issue is open, let me know the number, so I can check with the team.

Cheers,

tosaraja · June 2, 2025, 7:02am

Created: Blocker bug in scheduler since 5.10 · Issue #7071 · OpenNebula/one · GitHub
And PR as well.

Topic		Replies	Views
Scheduler doesn't go through all hosts Community Support	7	667	August 9, 2021
Can we get better feedback from the scheduler? Community Support	3	339	August 6, 2021
Scheduler rescheduling VMs on another cluster Development	4	902	June 22, 2016
No host meets capacity and SCHED_REQUIREMENTS Community Support	10	2300	October 14, 2016
Host allocation/stats not updating Community Support	3	268	March 23, 2022

Blocker bug in scheduler since 5.10

Related topics