After some significant debugging and reviewing the Scheduler.cc code I found the issue/WebUI bug.
This works: (HYPERVISOR=kvm) & (ID = 20)
WebUI give this by default: (HYPERVISOR=undefined) & (HOST_ID = 20)
It was somewhat obvious to change “undefined” to “kvm” but the HOST ID… well that took a few days to figure out.
=== Debugging Details
Workaround # was to put HOST_ID as an attribute on each host, but that wasn’t the best fix.
The HOST have an “ID” value, but needed to have “HOST_ID” value (I thought). I had to add the attribute to each host manually. I assumed was a bug? Maybe previous release had HOST_ID in the XML where it is only ID now?
Reviewing the scheduler code I was failing at this point:
logs:
Wed Oct 30 18:51:47 2024 [Z0][SCHED][DD]: Setting VM groups placement constraints. Total time: 0.00s
Wed Oct 30 18:51:47 2024 [Z0][SCHED][DD]: Host 19 discarded for VM 109. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 117) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (HYPERVISOR=kvm) & (HOST_ID = 20) )
Wed Oct 30 18:51:47 2024 [Z0][SCHED][DD]: Host 20 discarded for VM 109. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 117) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (HYPERVISOR=kvm) & (HOST_ID = 20) )
Wed Oct 30 18:51:47 2024 [Z0][SCHED][D]: Match-making results for VM 109:
Cannot schedule VM, there is no suitable host.
Tracing the code I’m at this point and failing the simple bool statement:
static bool match_host
if ( host->eval_bool(vm->get_requirements(), matched, &estr) != 0 )
{
ostringstream oss;
n_error++;
oss << "Error in SCHED_REQUIREMENTS: '" << vm->get_requirements()
<< "', error: " << estr;
vm->log(oss.str());
error = oss.str();
free(estr);
return false;
}
if (matched == false)
{
error = "It does not fulfill SCHED_REQUIREMENTS: " +
vm->get_requirements();
return false;
}
}
get_requirements() is this: (CLUSTER_ID = 117) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (HYPERVISOR=kvm) & (HOST_ID = 20) )
I then started manually adding attributes to each host until finding out that the one that mattered was HOST_ID. PUBLIC_CLOUD and PIN_POLICY there or not did not matter.
CLUSTER_ID is in the host xml already, but HOST_ID is NOT. Its there as “ID”.
This explains why all other use-case work as they trigger off the Cluster ID or Default cluster. This one use-case needs the HOST_ID to match.
oneadmin@opennebular-1:/var/log/one$ onehost show -x s002-m5-40g-kvm27.mitg-bxb300.cisco.com | grep HOST_ID
oneadmin@opennebular-1:/var/log/one$ onehost show -x s002-m5-40g-kvm27.mitg-bxb300.cisco.com | grep ID
<ID>20</ID>
<CLUSTER_ID>117</CLUSTER_ID>
Then as writing this realized… what If I just change the sandstone auto-filled in value to ID instead of HOST_ID… That worked!!! So I guess the bug is the WebUI is giving a bad example?
This works: (HYPERVISOR=kvm) & (ID = 20)
WebUI give this by default: (HYPERVISOR=undefined) & (HOST_ID = 20)
It was somewhat obvious to change “undefined” to “kvm” but the HOST ID… well that took a fee days to figure out.