Onprem Cannot schedule VM, there is no suitable host -

After some significant debugging and reviewing the Scheduler.cc code I found the issue/WebUI bug.

This works: (HYPERVISOR=kvm) & (ID = 20)
WebUI give this by default: (HYPERVISOR=undefined) & (HOST_ID = 20)
It was somewhat obvious to change “undefined” to “kvm” but the HOST ID… well that took a few days to figure out.

=== Debugging Details

Workaround # was to put HOST_ID as an attribute on each host, but that wasn’t the best fix.

The HOST have an “ID” value, but needed to have “HOST_ID” value (I thought). I had to add the attribute to each host manually. I assumed was a bug? Maybe previous release had HOST_ID in the XML where it is only ID now?

Reviewing the scheduler code I was failing at this point:

logs:

Wed Oct 30 18:51:47 2024 [Z0][SCHED][DD]: Setting VM groups placement constraints. Total time: 0.00s
Wed Oct 30 18:51:47 2024 [Z0][SCHED][DD]: Host 19 discarded for VM 109. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 117) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (HYPERVISOR=kvm) & (HOST_ID = 20) )
Wed Oct 30 18:51:47 2024 [Z0][SCHED][DD]: Host 20 discarded for VM 109. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 117) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (HYPERVISOR=kvm) & (HOST_ID = 20) )
Wed Oct 30 18:51:47 2024 [Z0][SCHED][D]: Match-making results for VM 109:
        Cannot schedule VM, there is no suitable host.


Tracing the code I’m at this point and failing the simple bool statement:

static bool match_host

        if ( host->eval_bool(vm->get_requirements(), matched, &estr) != 0 )
        {
            ostringstream oss;

            n_error++;

            oss << "Error in SCHED_REQUIREMENTS: '" << vm->get_requirements()
                << "', error: " << estr;

            vm->log(oss.str());

            error = oss.str();

            free(estr);

            return false;
        }

        if (matched == false)
        {
            error = "It does not fulfill SCHED_REQUIREMENTS: " +
                    vm->get_requirements();
            return false;
        }
    }

get_requirements() is this: (CLUSTER_ID = 117) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (HYPERVISOR=kvm) & (HOST_ID = 20) )

I then started manually adding attributes to each host until finding out that the one that mattered was HOST_ID. PUBLIC_CLOUD and PIN_POLICY there or not did not matter.

CLUSTER_ID is in the host xml already, but HOST_ID is NOT. Its there as “ID”.
This explains why all other use-case work as they trigger off the Cluster ID or Default cluster. This one use-case needs the HOST_ID to match.

oneadmin@opennebular-1:/var/log/one$ onehost show -x s002-m5-40g-kvm27.mitg-bxb300.cisco.com | grep HOST_ID
oneadmin@opennebular-1:/var/log/one$ onehost show -x s002-m5-40g-kvm27.mitg-bxb300.cisco.com | grep ID
  <ID>20</ID>
  <CLUSTER_ID>117</CLUSTER_ID>

Then as writing this realized… what If I just change the sandstone auto-filled in value to ID instead of HOST_ID… That worked!!! So I guess the bug is the WebUI is giving a bad example?

This works: (HYPERVISOR=kvm) & (ID = 20)
WebUI give this by default: (HYPERVISOR=undefined) & (HOST_ID = 20)
It was somewhat obvious to change “undefined” to “kvm” but the HOST ID… well that took a fee days to figure out.

1 Like