Scheduler rescheduling VMs on another cluster

Hi,

I am not sure is it a bug or expected behavior so please excuse my ignorance if it is something expected.
I have the following setup:

Cluster_ID=0:
HOST_ID=0
FILES_DS_ID=3
SYSTEM_DS_ID=0
IMAGE_DS_ID=0,100

Cluster_ID=100:
HOST_ID=1,2
FILES_DS_ID=3
SYSTEM_DS_ID=101
IMAGE_DS_ID=0,100

CLUSTER_ID=101:
HOST_ID=3
FILES_DS_ID=3
SYSTEM_DS_ID=102
IMAGE_DS_ID=103

When I trigger reschedule for the VMs running on Cluster_ID=100 some of them are rescheduled on Cluster_ID=0.
The clusters in question have both shared FILES and IMAGE DS but the SYSTEM_DS differ. The second host which is member of Cluster_ID=100 has enough resources (CPU and RAM) free.
My assumption is that in such situation the scheduler must reschedule the VMs to the hosts in the same cluster and system datastore - the hosts with most satisfying criteria met?

Kind Regards,
Anton Todorov

This seems to be a bug, in fact for rescheds we enforce to use the same system ds in the scheduler (i.e. it is a host resched not a system ds resched). However, somehow you are bypassing that and the algorithm sorts based on the provided SCHED_DS_RANK and SCHED_RANK.

Also core enforce the use of the same cluster in a migration. So I do not really know how can you get that situation. Are you issuing a onevm resched right?

Hi Ruben,

I’ve just finished dissecting another issue(and solution pushed) so later tonight or tomorrow morning I will update you with exact test cases that lead to this behavior.

Kind Regards,
Anton Todorov

Thanks for the patch :slight_smile: I am now looking if more states are affected by this. (i.e. not closed previous record).

Hi Ruben,

I cant reproduce the issue. It is possible that I’ve hit some random glitch in the scheduler during the restarts/reinstalls of the software… I’ll enable debugging of the scheduler, so if it happens again I’ll have more info.

Kind Regards,
Anton Todorov