Scheduler doesn't go through all hosts

We have 52 hosts running virtual machines. But currently we are seeing that we have lots of VMs PENDING, even though we have lots of capacity left. After debugging the scheduler I noticed, that for a given VM I debugged, it only went through 28 hosts, which after it decided that the VM can’t be dispatched anywhere. Slowly it goes through the queue of hundreds of VMs in PENDING state and each round only ~4-5 get assigned to a server.

It looks like the scheduler doesn’t go through the whole array of servers when looking for capacity.

And a follow up question: should the scheduler sort hosts according to load percentage, and then try to assign a VM to the host with least load first? Obviously taking the clusters into account. An incorrect cluster shouldn’t be on that list.

Version: 6.0.0.2

Answering my own follow up question. DEFAULT_SCHED can be configured in /etc/one/sched.conf that probably does what I want.

Half of our hosts are discarded due to good reasons, either they have no capacity or they don’t fulfill the requirements. But the rest of the hosts aren’t even considered. ONE just spits out the Rank of those hosts:

Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 1 discarded for VM 2246965. Not enough CPU capacity: 800/600
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 20 discarded for VM 2246965. Not enough CPU capacity: 800/0
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 25 discarded for VM 2246965. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (CLUSTER_ID = 0) & (HYPERVISOR = kvm) )
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 32 discarded for VM 2246965. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (CLUSTER_ID = 0) & (HYPERVISOR = kvm) )
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 37 discarded for VM 2246965. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (CLUSTER_ID = 0) & (HYPERVISOR = kvm) )
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 41 discarded for VM 2246965. Not enough memory: 28311552/3502297
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 44 discarded for VM 2246965. Not enough CPU capacity: 800/400
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 45 discarded for VM 2246965. Not enough memory: 28311552/9793721
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 57 discarded for VM 2246965. Not enough memory: 28311552/9898668
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 63 discarded for VM 2246965. Not enough CPU capacity: 800/600
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 65 discarded for VM 2246965. Not enough memory: 28311552/6752976
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 71 discarded for VM 2246965. Not enough CPU capacity: 800/400
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 72 discarded for VM 2246965. Not enough memory: 28311552/356534
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 73 discarded for VM 2246965. Not enough memory: 28311552/9898668
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 75 discarded for VM 2246965. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (CLUSTER_ID = 0) & (HYPERVISOR = kvm) )
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 76 discarded for VM 2246965. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (CLUSTER_ID = 0) & (HYPERVISOR = kvm) )
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 77 discarded for VM 2246965. Not enough CPU capacity: 800/600
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 78 discarded for VM 2246965. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (CLUSTER_ID = 0) & (HYPERVISOR = kvm) )
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 79 discarded for VM 2246965. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (CLUSTER_ID = 0) & (HYPERVISOR = kvm) )
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 80 discarded for VM 2246965. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (CLUSTER_ID = 0) & (HYPERVISOR = kvm) )
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 81 discarded for VM 2246965. Not enough CPU capacity: 800/600
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 82 discarded for VM 2246965. Not enough CPU capacity: 800/600
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 83 discarded for VM 2246965. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (CLUSTER_ID = 0) & (HYPERVISOR = kvm) )
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 84 discarded for VM 2246965. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (CLUSTER_ID = 0) & (HYPERVISOR = kvm) )
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 85 discarded for VM 2246965. Not enough CPU capacity: 800/600
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 86 discarded for VM 2246965. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (CLUSTER_ID = 0) & (HYPERVISOR = kvm) )
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 87 discarded for VM 2246965. Not enough CPU capacity: 800/0
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 88 discarded for VM 2246965. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( (CLUSTER_ID = 0) & (HYPERVISOR = kvm) )
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 89 discarded for VM 2246965. Not enough CPU capacity: 800/600
Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 90 discarded for VM 2246965. Not enough CPU capacity: 800/600
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 42 Rank: 7680
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 43 Rank: 6720
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 46 Rank: 5520
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 47 Rank: 7920
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 48 Rank: 6960
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 49 Rank: 6400
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 51 Rank: 6640
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 52 Rank: 6240
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 53 Rank: 6960
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 54 Rank: 5600
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 55 Rank: 7440
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 56 Rank: 6960
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 58 Rank: 6400
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 59 Rank: 6800
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 61 Rank: 6400
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 64 Rank: 6880
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 66 Rank: 6560
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 67 Rank: 6960
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 68 Rank: 6640
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 69 Rank: 6320
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 70 Rank: 6720
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 0 Rank: 0

There are more settings in the /etc/one/sched.conf which may limit the number of the VMs deployed at once. See MAX_DISPATCH and MAX_HOST

Can you please send output of onevm show -x for one of the VM which is not deployed and onehost show -x for one host which is discarded without any reason?

MAX_HOST=1 as default. We don’t dare to touch that in case we produce race conditions by increasing that. I don’t want to risk scheduler assigning 2 VMs to the same host when it has capacity for only 1 of them.

MAX_DISPATCH=30. And the max number of VM’s I’ve seen is around 10. Usually much less.

<VM>
  <ID>2251291</ID>
  <UID>2</UID>
  <GID>1</GID>
  <UNAME>coin</UNAME>
  <GNAME>users</GNAME>
  <NAME>Build-qt3d-Windows-Windows-1628056942-21784</NAME>
  <PERMISSIONS>
    <OWNER_U>1</OWNER_U>
    <OWNER_M>1</OWNER_M>
    <OWNER_A>1</OWNER_A>
    <GROUP_U>0</GROUP_U>
    <GROUP_M>0</GROUP_M>
    <GROUP_A>0</GROUP_A>
    <OTHER_U>0</OTHER_U>
    <OTHER_M>0</OTHER_M>
    <OTHER_A>0</OTHER_A>
  </PERMISSIONS>
  <LAST_POLL>0</LAST_POLL>
  <STATE>1</STATE>
  <LCM_STATE>0</LCM_STATE>
  <PREV_STATE>1</PREV_STATE>
  <PREV_LCM_STATE>0</PREV_LCM_STATE>
  <RESCHED>0</RESCHED>
  <STIME>1628164533</STIME>
  <ETIME>0</ETIME>
  <DEPLOY_ID/>
  <MONITORING/>
  <TEMPLATE>
    <AUTOMATIC_DS_REQUIREMENTS><![CDATA[("CLUSTERS/ID" @> 0)]]></AUTOMATIC_DS_REQUIREMENTS>
    <AUTOMATIC_NIC_REQUIREMENTS><![CDATA[("CLUSTERS/ID" @> 0)]]></AUTOMATIC_NIC_REQUIREMENTS>
    <AUTOMATIC_REQUIREMENTS><![CDATA[(CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED)]]></AUTOMATIC_REQUIREMENTS>
    <CONTEXT>
      <COIN_DOWNLOAD_URL><![CDATA[http://10.225.148.117:8080/coin/binary/windows_amd64/agent.exe]]></COIN_DOWNLOAD_URL>
      <COIN_LAUNCH_PARAMETERS><![CDATA[--scheduler-service=10.225.148.117:34453 --webserver-service=10.225.148.117:8080 --sourcestorage-service=10.225.148.117:42007 --storage-service=10.225.148.117:58877 qt/qt3d/7ba0d25b265bd83f0b9c1e04173dce86d2b663be/WindowsWindows_10x86_64WindowsWindows_10x86_64Clangqtci-windows-10-x86_64-52-4b153eSccache/e10ec675b11ab4d18257c1475df50c7711d99de4/Build 1628056942-21784]]></COIN_LAUNCH_PARAMETERS>
      <DISK_ID><![CDATA[1]]></DISK_ID>
      <SSH_PUBLIC_KEY><![CDATA[]]></SSH_PUBLIC_KEY>
      <TARGET><![CDATA[hdb]]></TARGET>
    </CONTEXT>
    <CPU><![CDATA[4]]></CPU>
    <DISK>
      <ALLOW_ORPHANS><![CDATA[NO]]></ALLOW_ORPHANS>
      <CACHE><![CDATA[writeback]]></CACHE>
      <CLONE><![CDATA[YES]]></CLONE>
      <CLONE_TARGET><![CDATA[SYSTEM]]></CLONE_TARGET>
      <CLUSTER_ID><![CDATA[0]]></CLUSTER_ID>
      <DATASTORE><![CDATA[Qt Company Virtual Machine Disk Image Datastore 100]]></DATASTORE>
      <DATASTORE_ID><![CDATA[100]]></DATASTORE_ID>
      <DEV_PREFIX><![CDATA[sd]]></DEV_PREFIX>
      <DISK_ID><![CDATA[0]]></DISK_ID>
      <DISK_SNAPSHOT_TOTAL_SIZE><![CDATA[0]]></DISK_SNAPSHOT_TOTAL_SIZE>
      <DISK_TYPE><![CDATA[FILE]]></DISK_TYPE>
      <DRIVER><![CDATA[qcow2]]></DRIVER>
      <IMAGE><![CDATA[qtci-windows-10-x86_64-52-4b153e]]></IMAGE>
      <IMAGE_ID><![CDATA[5071]]></IMAGE_ID>
      <IMAGE_STATE><![CDATA[2]]></IMAGE_STATE>
      <LN_TARGET><![CDATA[NONE]]></LN_TARGET>
      <ORIGINAL_SIZE><![CDATA[563200]]></ORIGINAL_SIZE>
      <READONLY><![CDATA[NO]]></READONLY>
      <SAVE><![CDATA[NO]]></SAVE>
      <SIZE><![CDATA[563200]]></SIZE>
      <SOURCE><![CDATA[/var/lib/one//datastores/100/f67fc3ccf082ba84ca17f11e48e37896]]></SOURCE>
      <TARGET><![CDATA[sda]]></TARGET>
      <TM_MAD><![CDATA[qcow2_backing]]></TM_MAD>
      <TYPE><![CDATA[FILE]]></TYPE>
    </DISK>
    <FEATURES>
      <VIRTIO_SCSI_QUEUES><![CDATA[1]]></VIRTIO_SCSI_QUEUES>
    </FEATURES>
    <GRAPHICS>
      <LISTEN><![CDATA[0.0.0.0]]></LISTEN>
      <TYPE><![CDATA[VNC]]></TYPE>
    </GRAPHICS>
    <MEMORY><![CDATA[15360]]></MEMORY>
    <NIC>
      <AR_ID><![CDATA[0]]></AR_ID>
      <BRIDGE><![CDATA[br0]]></BRIDGE>
      <BRIDGE_TYPE><![CDATA[linux]]></BRIDGE_TYPE>
      <CLUSTER_ID><![CDATA[0]]></CLUSTER_ID>
      <MAC><![CDATA[02:00:e7:ef:18:b5]]></MAC>
      <NAME><![CDATA[NIC0]]></NAME>
      <NETWORK><![CDATA[ONE-1-PROD]]></NETWORK>
      <NETWORK_ID><![CDATA[0]]></NETWORK_ID>
      <NIC_ID><![CDATA[0]]></NIC_ID>
      <PHYDEV><![CDATA[bond0]]></PHYDEV>
      <SECURITY_GROUPS><![CDATA[0]]></SECURITY_GROUPS>
      <TARGET><![CDATA[one-2251291-0]]></TARGET>
      <VLAN_ID><![CDATA[635]]></VLAN_ID>
      <VN_MAD><![CDATA[802.1Q]]></VN_MAD>
    </NIC>
    <RAW>
      <DATA><![CDATA[<metadata>
                                                 <coin:os xmlns:coin="http://qt.io/coin/">windows</coin:os>
                                              </metadata>]]></DATA>
      <TYPE><![CDATA[kvm]]></TYPE>
    </RAW>
    <SECURITY_GROUP_RULE>
      <PROTOCOL><![CDATA[ALL]]></PROTOCOL>
      <RULE_TYPE><![CDATA[OUTBOUND]]></RULE_TYPE>
      <SECURITY_GROUP_ID><![CDATA[0]]></SECURITY_GROUP_ID>
      <SECURITY_GROUP_NAME><![CDATA[default]]></SECURITY_GROUP_NAME>
    </SECURITY_GROUP_RULE>
    <SECURITY_GROUP_RULE>
      <PROTOCOL><![CDATA[ALL]]></PROTOCOL>
      <RULE_TYPE><![CDATA[INBOUND]]></RULE_TYPE>
      <SECURITY_GROUP_ID><![CDATA[0]]></SECURITY_GROUP_ID>
      <SECURITY_GROUP_NAME><![CDATA[default]]></SECURITY_GROUP_NAME>
    </SECURITY_GROUP_RULE>
    <VCPU><![CDATA[4]]></VCPU>
    <VMID><![CDATA[2251291]]></VMID>
  </TEMPLATE>
  <USER_TEMPLATE>
    <COIN>
      <AGENT_ID><![CDATA[1628056942-21784]]></AGENT_ID>
      <BUILD_KEY><![CDATA[qt/qt3d/7ba0d25b265bd83f0b9c1e04173dce86d2b663be/WindowsWindows_10x86_64WindowsWindows_10x86_64Clangqtci-windows-10-x86_64-52-4b153eSccache/e10ec675b11ab4d18257c1475df50c7711d99de4/Build]]></BUILD_KEY>
    </COIN>
    <HYPERVISOR><![CDATA[kvm]]></HYPERVISOR>
    <SCHED_MESSAGE><![CDATA[Thu Aug  5 12:04:04 2021 : Cannot dispatch VM to any Host. Possible reasons: Not enough capacity in Host or System DS, dispatch limit reached, or limit of free leases reached.]]></SCHED_MESSAGE>
    <SCHED_RANK><![CDATA[FREE_CPU]]></SCHED_RANK>
    <SCHED_REQUIREMENTS><![CDATA[(CLUSTER_ID = 0) & (HYPERVISOR = kvm)]]></SCHED_REQUIREMENTS>
  </USER_TEMPLATE>
  <HISTORY_RECORDS/>
</VM>
<HOST>
  <ID>46</ID>
  <NAME>saved-toad.on2.qt.io</NAME>
  <STATE>2</STATE>
  <PREV_STATE>2</PREV_STATE>
  <IM_MAD><![CDATA[kvm]]></IM_MAD>
  <VM_MAD><![CDATA[kvm]]></VM_MAD>
  <CLUSTER_ID>0</CLUSTER_ID>
  <CLUSTER>default</CLUSTER>
  <HOST_SHARE>
    <MEM_USAGE>33554432</MEM_USAGE>
    <CPU_USAGE>1000</CPU_USAGE>
    <TOTAL_MEM>263900432</TOTAL_MEM>
    <TOTAL_CPU>8000</TOTAL_CPU>
    <MAX_MEM>211120345</MAX_MEM>
    <MAX_CPU>6400</MAX_CPU>
    <RUNNING_VMS>3</RUNNING_VMS>
    <VMS_THREAD>1</VMS_THREAD>
    <DATASTORES>
      <DISK_USAGE><![CDATA[0]]></DISK_USAGE>
      <DS>
        <FREE_MB><![CDATA[482237]]></FREE_MB>
        <ID><![CDATA[0]]></ID>
        <TOTAL_MB><![CDATA[3001826]]></TOTAL_MB>
        <USED_MB><![CDATA[2367037]]></USED_MB>
      </DS>
      <FREE_DISK><![CDATA[482237]]></FREE_DISK>
      <MAX_DISK><![CDATA[3001826]]></MAX_DISK>
      <USED_DISK><![CDATA[2367037]]></USED_DISK>
    </DATASTORES>
    <PCI_DEVICES/>
    <NUMA_NODES>
      <NODE>
        <CORE>
          <CPUS><![CDATA[38:-1,78:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[28]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[28:-1,68:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[20]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[18:-1,58:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[12]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[8:-1,48:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[4]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[36:-1,76:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[27]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[26:-1,66:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[19]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[16:-1,56:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[11]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[6:-1,46:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[3]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[34:-1,74:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[26]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[24:-1,64:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[18]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[14:-1,54:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[10]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[4:-1,44:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[2]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[32:-1,72:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[25]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[22:-1,62:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[17]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[12:-1,52:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[9]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[2:-1,42:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[1]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[30:-1,70:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[24]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[20:-1,60:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[16]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[10:-1,50:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[8]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[0:-1,40:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[0]]></ID>
        </CORE>
        <HUGEPAGE>
          <FREE><![CDATA[3163018112]]></FREE>
          <PAGES><![CDATA[103]]></PAGES>
          <SIZE><![CDATA[1048576]]></SIZE>
          <USAGE><![CDATA[0]]></USAGE>
        </HUGEPAGE>
        <MEMORY>
          <DISTANCE><![CDATA[0 1]]></DISTANCE>
          <FREE><![CDATA[0]]></FREE>
          <TOTAL><![CDATA[131815192]]></TOTAL>
          <USAGE><![CDATA[0]]></USAGE>
          <USED><![CDATA[0]]></USED>
        </MEMORY>
        <NODE_ID><![CDATA[0]]></NODE_ID>
      </NODE>
      <NODE>
        <CORE>
          <CPUS><![CDATA[21:-1,61:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[16]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[11:-1,51:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[8]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[9:-1,49:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[4]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[7:-1,47:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[3]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[5:-1,45:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[2]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[3:-1,43:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[1]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[1:-1,41:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[0]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[39:-1,79:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[28]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[29:-1,69:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[20]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[19:-1,59:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[12]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[37:-1,77:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[27]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[27:-1,67:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[19]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[17:-1,57:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[11]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[35:-1,75:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[26]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[25:-1,65:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[18]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[15:-1,55:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[10]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[33:-1,73:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[25]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[23:-1,63:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[17]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[13:-1,53:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[9]]></ID>
        </CORE>
        <CORE>
          <CPUS><![CDATA[31:-1,71:-1]]></CPUS>
          <DEDICATED><![CDATA[NO]]></DEDICATED>
          <FREE><![CDATA[2]]></FREE>
          <ID><![CDATA[24]]></ID>
        </CORE>
        <HUGEPAGE>
          <FREE><![CDATA[3163018112]]></FREE>
          <PAGES><![CDATA[102]]></PAGES>
          <SIZE><![CDATA[1048576]]></SIZE>
          <USAGE><![CDATA[0]]></USAGE>
        </HUGEPAGE>
        <MEMORY>
          <DISTANCE><![CDATA[1 0]]></DISTANCE>
          <FREE><![CDATA[0]]></FREE>
          <TOTAL><![CDATA[132085240]]></TOTAL>
          <USAGE><![CDATA[0]]></USAGE>
          <USED><![CDATA[0]]></USED>
        </MEMORY>
        <NODE_ID><![CDATA[1]]></NODE_ID>
      </NODE>
    </NUMA_NODES>
  </HOST_SHARE>
  <VMS>
    <ID>446451</ID>
    <ID>2251136</ID>
    <ID>2251167</ID>
  </VMS>
  <TEMPLATE>
    <ARCH><![CDATA[x86_64]]></ARCH>
    <CPUSPEED><![CDATA[1199]]></CPUSPEED>
    <HOSTNAME><![CDATA[saved-toad]]></HOSTNAME>
    <HYPERVISOR><![CDATA[kvm]]></HYPERVISOR>
    <IM_MAD><![CDATA[kvm]]></IM_MAD>
    <KVM_CPU_MODEL><![CDATA[Broadwell-IBRS]]></KVM_CPU_MODEL>
    <KVM_CPU_MODELS><![CDATA[486 pentium pentium2 pentium3 pentiumpro coreduo n270 core2duo qemu32 kvm32 cpu64-rhel5 cpu64-rhel6 qemu64 kvm64 Conroe Penryn Nehalem Nehalem-IBRS Westmere Westmere-IBRS SandyBridge SandyBridge-IBRS IvyBridge IvyBridge-IBRS Haswell-noTSX Haswell-noTSX-IBRS Haswell Haswell-IBRS Broadwell-noTSX Broadwell-noTSX-IBRS Broadwell Broadwell-IBRS Skylake-Client Skylake-Client-IBRS Skylake-Client-noTSX-IBRS Skylake-Server Skylake-Server-IBRS Skylake-Server-noTSX-IBRS Cascadelake-Server Cascadelake-Server-noTSX Icelake-Client Icelake-Client-noTSX Icelake-Server Icelake-Server-noTSX athlon phenom Opteron_G1 Opteron_G2 Opteron_G3 Opteron_G4 Opteron_G5 EPYC EPYC-IBPB EPYC-Rome EPYC-Milan Dhyana]]></KVM_CPU_MODELS>
    <KVM_MACHINES><![CDATA[pc-i440fx-focal ubuntu pc-0.15 pc-i440fx-2.12 pc-i440fx-2.0 pc-i440fx-xenial pc-q35-4.2 q35 pc-i440fx-2.5 pc-i440fx-4.2 pc pc-q35-xenial pc-i440fx-1.5 pc-0.12 pc-q35-2.7 pc-q35-eoan-hpb pc-i440fx-disco-hpb pc-i440fx-zesty pc-q35-artful pc-i440fx-trusty pc-i440fx-2.2 pc-i440fx-eoan-hpb pc-q35-focal-hpb pc-1.1 pc-q35-bionic-hpb pc-i440fx-artful pc-i440fx-2.7 pc-i440fx-yakkety pc-q35-2.4 pc-q35-cosmic-hpb pc-q35-2.10 pc-i440fx-1.7 pc-0.14 pc-q35-2.9 pc-i440fx-2.11 pc-q35-3.1 pc-q35-4.1 pc-i440fx-2.4 pc-1.3 pc-i440fx-4.1 pc-q35-eoan pc-i440fx-2.9 pc-i440fx-bionic-hpb isapc pc-i440fx-1.4 pc-q35-cosmic pc-q35-2.6 pc-i440fx-3.1 pc-q35-bionic pc-q35-disco-hpb pc-i440fx-cosmic pc-q35-2.12 pc-i440fx-bionic pc-q35-disco pc-i440fx-cosmic-hpb pc-i440fx-2.1 pc-1.0 pc-i440fx-wily pc-i440fx-2.6 pc-q35-4.0.1 pc-i440fx-1.6 pc-0.13 pc-q35-2.8 pc-i440fx-2.10 pc-q35-3.0 pc-q35-zesty pc-q35-4.0 microvm pc-i440fx-2.3 pc-q35-focal ubuntu-q35 pc-i440fx-disco pc-1.2 pc-i440fx-4.0 pc-i440fx-focal-hpb pc-i440fx-2.8 pc-i440fx-eoan pc-q35-2.5 pc-i440fx-3.0 pc-q35-yakkety pc-q35-2.11]]></KVM_MACHINES>
    <MODELNAME><![CDATA[Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz]]></MODELNAME>
    <RESERVED_CPU><![CDATA[]]></RESERVED_CPU>
    <RESERVED_MEM><![CDATA[]]></RESERVED_MEM>
    <VERSION><![CDATA[5.12.0.3]]></VERSION>
    <VM_MAD><![CDATA[kvm]]></VM_MAD>
  </TEMPLATE>
  <MONITORING>
    <TIMESTAMP>1628165060</TIMESTAMP>
    <ID>46</ID>
    <CAPACITY>
      <FREE_CPU><![CDATA[7520]]></FREE_CPU>
      <FREE_MEMORY><![CDATA[41696096]]></FREE_MEMORY>
      <USED_CPU><![CDATA[480]]></USED_CPU>
      <USED_MEMORY><![CDATA[222204336]]></USED_MEMORY>
    </CAPACITY>
    <SYSTEM/>
  </MONITORING>
</HOST>

We got our backlog of VMs deploye on hosts after I freed up more disk space on the hosts. Nothing indicated that the disks were getting filled, but a common nominator between the hosts not used was that they had a bit less disk space available.

Since we’re using NFS and cachefilesd, it was only a matter of restarting the service, as the culling didn’t work properly. It freed up more space, and VMs got assigned.

However, even though the amount of PENDING VMs came down, the oddity in scheduling remains in a way. We still see only parts of the hosts being searched when looking for a host. But since we sort the hosts by the least amount of CPU now, I think it works. We always find hosts that fit the VMs now. But the array containing the hosts still doesn’t seem to have all of them in it.

The host doesn’t appear in the list?:

Thu Aug  5 06:04:34 2021 [Z0][SCHED][D]: Host 90 discarded for VM 2246965. Not enough CPU capacity: 800/600
Thu Aug  5 06:04:34 2021 [Z0][RANK][D]: ID: 42 Rank: 7680

First message Host xx discarded means the host doesn’t have CPU or MEMORY capacity, or wrong hypervisor
Second message: ID: XX Rank: YY Host is fine. But if the VM is not deployed, it doesn’t have other resources (network or datastorage). There should be a message in the scheduler log, something like: Cannot schedule VM, there is no suitable network.

I already improved the Scheduler messages It will write missing resource in SCHED_MESSAGE It should be in 6.0.4 release.

If you wish to try it and you know how to build OpenNebula I can send you a patch.

In other words, the host is fine, but it wasn’t deployed due to lack of datastorage. Can that check be skipped somehow? Because we are only working with linked clones, so actually the whole qcow2 doesn’t have to be fitted on to the host. Meaning, mere megabytes could in theory be enough for it to be able to launch a VM, although in practice VMs generate logs pretty quickly, thus exceeding “mere megabytes”. Also, as the hosts use cachefilesd, which takes up “any free space” for the cache, but cullies it when it need space ( if it worked :smile: ), the info regarding disk space is useless.