When I trigger reschedule for the VMs running on Cluster_ID=100 some of them are rescheduled on Cluster_ID=0.
The clusters in question have both shared FILES and IMAGE DS but the SYSTEM_DS differ. The second host which is member of Cluster_ID=100 has enough resources (CPU and RAM) free.
My assumption is that in such situation the scheduler must reschedule the VMs to the hosts in the same cluster and system datastore - the hosts with most satisfying criteria met?
This seems to be a bug, in fact for rescheds we enforce to use the same system ds in the scheduler (i.e. it is a host resched not a system ds resched). However, somehow you are bypassing that and the algorithm sorts based on the provided SCHED_DS_RANK and SCHED_RANK.
Also core enforce the use of the same cluster in a migration. So I do not really know how can you get that situation. Are you issuing a onevm resched right?
I’ve just finished dissecting another issue(and solution pushed) so later tonight or tomorrow morning I will update you with exact test cases that lead to this behavior.
I cant reproduce the issue. It is possible that I’ve hit some random glitch in the scheduler during the restarts/reinstalls of the software… I’ll enable debugging of the scheduler, so if it happens again I’ll have more info.