Template with "FREE_CPU" policy

Hello,

I’m testing a template with “FREE_CPU” placement policy (Host rank policy “load-aware”), because my cluster has two servers (64+104 cores) and I would like that users not have to worry about selecting one or other server, but scheduler choose the server with more free CPUs.

With this configuration, first 40 instantianted VMs should go to second server because this one has 40 CPUs more. But, when the number is similar in both servers, I would like that each instantiation goes to each server, one for first server, one for second server and so on. However, I have read that “FREE_CPU” is updated in each monitorization interval (cycle) (Virtual Machine Template — OpenNebula 6.4.0 documentation), so if I instantiate 20 VMs at once, probably these 20 VMs will not go 10 to first server and 10 to second server, but they will go, maybe, to the same server because “FREE_CPU” is not being updated during instantiation of these 20 VMs.

Is there any configuration way for correcting this behaviour? I suppose I can modify monitorization intervals but, what will be the correct values to allow my desire behaviour?

In my OpenNebula cluster, it is normal that users instantiante a large number of VMs at the same time (they are students and, during class, they are running theirs VMs for academic purposes during teacher’s explanation). So, for example, en 5 minutes, users hay have instantiated 30 VMs.

My monitor.conf file contains:

PROBES_PERIOD = [
    BEACON_HOST    = 30,
    SYSTEM_HOST    = 600,
    MONITOR_HOST   = 120,
    STATE_VM       = 5,
    MONITOR_VM     = 30,
    SYNC_STATE_VM  = 180
]

I have check in /var/log/one/monitor.log that each server is monitored each 120 seconds. If I change “MONITOR_HOST = 120” to “MONITOR_HOST = 30” (for example), could be a monitoring problem? With these reconfiguration, FREE_CPU will be update more quickly?

Thanks.

MONITOR_HOST determines how often these scripts run. You will indeed get more updates on FREE_CPU among other values. However, consider also using CPU_USAGE for the placement policy, while not as accurate as FREE_CPU it isn’t tied to monitoring scripts since it just determines CPUs being used by VMs. More information about such values here.

Hello,

But can I use these values in the policy placement scheduling options? How? If I select “Scheduling → Policy” (in VM Template) and select “Load-Aware”, it appears directly “FREE_CPU”. How can I select or configure with “CPU_USAGE”?

Thanks.

You should be able to define a CUSTOM policy on sched.conf.

Hello @dclavijo @ahuertas @pczerny,

I have configured all my templates with “FREE_CPU”. Both servers have PRIORITY=100… but, I don’t know why, all my VMs are instantiated in my second server (numered as 15)… Why?

My servers are defined in the following way:

I don’t know why all instances are created in my second server (#15)… and it’s very important for that policy “FREE_CPU” runs OK, because in my OpenNebula sceneario, there are between 500-600 VMs, so with “FREE_CPU” I don’t worry about configuring each template to deploy in first or second server.

Thanks a lot!

You can take a look at the decision process of the scheduler by increasing the debug level. Edit the /etc/one/sched.conf configuration and restart the opennebula-scheduler service. Then during the instantiation/scheduling process take a look at the scheduler log at /var/log/one/sched.log.

Hello @dclavijo,

I have added more debug to sched (DEBUG_LEVEL = 5). I have got more messages and info, but I don’t know why it is happening. I explain all here:

My sched.conf is:

MESSAGE_SIZE = 1073741824
TIMEOUT      = 60

ONE_XMLRPC = "http://localhost:2633/RPC2"

SCHED_INTERVAL = 30

MAX_VM       = 50
MAX_DISPATCH = 30
MAX_HOST     = 20

LIVE_RESCHEDS  = 0
COLD_MIGRATE_MODE = 0

MEMORY_SYSTEM_DS_SCALE = 0

DIFFERENT_VNETS = YES

DEFAULT_SCHED = [
    policy = 1
]

DEFAULT_DS_SCHED = [
   policy = 3
]

DEFAULT_NIC_SCHED = [
   policy = 1
]

LOG = [
  SYSTEM      = "file",
  DEBUG_LEVEL = 5
]

I have instantiate some VMs with this template:

CONTEXT = [
  NETWORK = "YES",
  PASSWORD = "5yoDK8jK3gKWuCS0RFtsNQ==",
  SSH_PUBLIC_KEY = "$USER[SSH_PUBLIC_KEY]",
  START_SCRIPT_BASE64 = "L2Jpbi9lY2hvICJhZG1pbnA6TmVidWxhQ2FvcyIgfCBjaHBhc3N3ZA==" ]
CPU = "0.5"
CPU_COST = "0.01"
CPU_MODEL = [
  MODEL = "host-passthrough" ]
DISK = [
  IMAGE = "Ubuntu22.04v1.4-disk0",
  IMAGE_UNAME = "rsuppi" ]
DISK_COST = "7.5e-7"
FEATURES = [
  ACPI = "yes" ]
GRAPHICS = [
  KEYMAP = "es",
  LISTEN = "0.0.0.0",
  TYPE = "VNC" ]
HOT_RESIZE = [
  CPU_HOT_ADD_ENABLED = "NO",
  MEMORY_HOT_ADD_ENABLED = "NO" ]
HYPERVISOR = "kvm"
INPUTS_ORDER = ""
LOGO = "images/logos/ubuntu.png"
MEMORY = "2048"
MEMORY_COST = "0.0000146484375"
MEMORY_UNIT_COST = "GB"
NIC = [
  NETWORK_ID = "0" ]
OS = [
  BOOT = "disk0",
  FIRMWARE = "",
  FIRMWARE_SECURE = "YES" ]
SCHED_RANK = "FREE_CPU"
USER_INPUTS = [
  CPU = "O|fixed|| |0.5",
  MEMORY = "M|range||2048..4096|2048" ]
VCPU = "1"

However, when I instantiate it, logs on “sched.log” report this information:

Thu Oct 13 10:43:35 2022 [Z0][SCHED][D]: Getting VM and Host information. Total time: 0.97s
Thu Oct 13 10:43:35 2022 [Z0][VMGRP][D]: VM Group Scheduling information

Thu Oct 13 10:43:35 2022 [Z0][SCHED][D]: Setting VM groups placement constraints. Total time: 0.00s
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : FREE_CPU
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 14 Rank: 6336
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 15 Rank: 8736
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : PRIORITY
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 0 Rank: 0
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : FREE_CPU
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 14 Rank: 6336
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 15 Rank: 8736
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : PRIORITY
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 0 Rank: 0
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : FREE_CPU
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 14 Rank: 6336
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 15 Rank: 8736
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : PRIORITY
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 0 Rank: 0
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : FREE_CPU
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 14 Rank: 6336
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 15 Rank: 8736
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : PRIORITY
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 0 Rank: 0
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : FREE_CPU
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 14 Rank: 6336
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 15 Rank: 8736
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : PRIORITY
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 0 Rank: 0
Thu Oct 13 10:43:35 2022 [Z0][SCHED][D]: Match Making statistics:

In this information, you can see that:

Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : FREE_CPU
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 14 Rank: 6336
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 15 Rank: 8736

but my server “ID 14” has all its 6400 “cores” free and my server “ID 15” has 8975 “cores” assigned… so… I don’t understand anything.

Server 14 has a total of 6400 “cpus”. Server 15 has a total of 10400 “cpus”. When I have instantiate my VM, server 14 had 6400 free cpus and server 15 had 10400-8975=1425… and 1425<6400, so I supposed that all my VM must be instantiated in “Server 14” that had more FREE_CPU free…

Please, help.

Thanks!

Hi again, @dclavijo,

Now, my server “ID 15” is very full of VMs… and server “ID 14” is empty… and all news VMs are being instantiated in server “ID 15”.

I don’t understand anything.

Thanks.

Hi again, @dclavijo (and @ahuertas and @pczerny)

Today, my small cluster has arrived to run 350 VMs at the same time… and all in the same server because FREE_CPU is not working fine. I have rechecked all configuration, logs and so on, but I haven’t find the problem.

Now, I have found that in a running VM, into “Template” tab, I can see this:

User template
HOT_RESIZE = [
  CPU_HOT_ADD_ENABLED = "NO",
  MEMORY_HOT_ADD_ENABLED = "NO" ]
HYPERVISOR = "kvm"
INPUTS_ORDER = ""
LOGO = "images/logos/ubuntu.png"
MEMORY_UNIT_COST = "MB"
SCHED_RANK = "FREE_CPU"
SUNSTONE = [
  NETWORK_SELECT = "NO" ]
USER_INPUTS = [
  CPU = "O|fixed|| |2",
  MEMORY = "O|fixed|| |4096" ]
Template
AUTOMATIC_DS_REQUIREMENTS = "(\"CLUSTERS/ID\" @> 0)"
AUTOMATIC_NIC_REQUIREMENTS = "(\"CLUSTERS/ID\" @> 0)"
AUTOMATIC_REQUIREMENTS = "(CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED)"
CONTEXT = [
  DISK_ID = "1",
  ETH0_DNS = "8.8.8.8",
  ETH0_EXTERNAL = "",
  ETH0_GATEWAY = "10.10.10.1",
  ETH0_GATEWAY6 = "",
  ETH0_IP = "10.10.11.26",
  ETH0_IP6 = "",
  ETH0_IP6_GATEWAY = "",
  ETH0_IP6_METHOD = "",
  ETH0_IP6_METRIC = "",
  ETH0_IP6_PREFIX_LENGTH = "",
  ETH0_IP6_ULA = "",
  ETH0_MAC = "02:00:0a:0a:0b:1a",
  ETH0_MASK = "255.255.254.0",
  ETH0_METHOD = "",
  ETH0_METRIC = "",
  ETH0_MTU = "",
  ETH0_NETWORK = "10.10.10.0",
  ETH0_SEARCH_DOMAIN = "",
  ETH0_VLAN_ID = "",
  ETH0_VROUTER_IP = "",
  ETH0_VROUTER_IP6 = "",
  ETH0_VROUTER_MANAGEMENT = "",
  NETWORK = "YES",
  PASSWORD = "5yoDK8jK3gKWuCS0RFtsNQ==",
  SSH_PUBLIC_KEY = "",
  TARGET = "hda" ]
CPU = "2"
DISK = [
  ALLOW_ORPHANS = "FORMAT",
  CLONE = "YES",
  CLONE_TARGET = "SYSTEM",
  CLUSTER_ID = "0",
  DATASTORE = "default",
  DATASTORE_ID = "1",
  DEV_PREFIX = "sd",
  DISK_ID = "0",
  DISK_SNAPSHOT_TOTAL_SIZE = "0",
  DISK_TYPE = "FILE",
  DRIVER = "qcow2",
  FORMAT = "qcow2",
  IMAGE = "Ubuntu-20.04-Spark-Jupyter",
  IMAGE_ID = "273",
  IMAGE_STATE = "2",
  IMAGE_UNAME = "oneadmin",
  LN_TARGET = "NONE",
  ORIGINAL_SIZE = "20480",
  READONLY = "NO",
  SAVE = "NO",
  SIZE = "20480",
  SOURCE = "/var/lib/one//datastores/1/e8073ad219b773b47538921f94cbb6e0",
  TARGET = "sda",
  TM_MAD = "qcow2",
  TYPE = "FILE" ]
FEATURES = [
  ACPI = "yes" ]
GRAPHICS = [
  KEYMAP = "es",
  LISTEN = "0.0.0.0",
  PORT = "16881",
  TYPE = "VNC" ]
MEMORY = "4096"
NIC = [
  AR_ID = "0",
  BRIDGE = "br1",
  BRIDGE_TYPE = "linux",
  CLUSTER_ID = "0",
  GATEWAY = "10.10.10.1",
  IP = "10.10.11.26",
  MAC = "02:00:0a:0a:0b:1a",
  NAME = "NIC0",
  NETWORK = "Internet",
  NETWORK_ID = "0",
  NETWORK_UNAME = "oneadmin",
  NIC_ID = "0",
  SECURITY_GROUPS = "0",
  TARGET = "one-10981-0",
  VN_MAD = "fw" ]
OS = [
  FIRMWARE = "",
  FIRMWARE_SECURE = "YES",
  UUID = "55ec0f47-4dad-47ce-8c2e-53e853d26ae1" ]
SECURITY_GROUP_RULE = [
  PROTOCOL = "ALL",
  RULE_TYPE = "OUTBOUND",
  SECURITY_GROUP_ID = "0",
  SECURITY_GROUP_NAME = "default" ]
SECURITY_GROUP_RULE = [
  PROTOCOL = "ALL",
  RULE_TYPE = "INBOUND",
  SECURITY_GROUP_ID = "0",
  SECURITY_GROUP_NAME = "default" ]
TEMPLATE_ID = "333"
TM_MAD_SYSTEM = "qcow2"
VCPU = "2"
VMID = "10981"

How can I delete the following lines? There aren’t defined in the template… but… they appears into “User template” from the running VM:

AUTOMATIC_DS_REQUIREMENTS = "(\"CLUSTERS/ID\" @> 0)"
AUTOMATIC_NIC_REQUIREMENTS = "(\"CLUSTERS/ID\" @> 0)"
AUTOMATIC_REQUIREMENTS = "(CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED)"

I have got troubles with “AUTOMATIC_REQUIREMENTS” because in some templates, the value of this variable was “(CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( ID=“15” )” so in that cases, VMs seemed to be forced to be instantiated in “ID=15” because server “ID 14”, obviouly, is not 15, but 14… For example, in this case:

Thu Oct 13 09:23:44 2022 [Z0][SCHED][D]: Setting VM groups placement constraints. Total time: 0.00s
Thu Oct 13 09:23:44 2022 [Z0][SCHED][D]: Host 14 discarded for VM 10956. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( ID="15" )
Thu Oct 13 09:23:44 2022 [Z0][RANK][D]: Rank evaluation for expression : - RUNNING_VMS
Thu Oct 13 09:23:44 2022 [Z0][RANK][D]: ID: 15 Rank: -6
Thu Oct 13 09:23:44 2022 [Z0][RANK][D]: Rank evaluation for expression : PRIORITY
Thu Oct 13 09:23:44 2022 [Z0][RANK][D]: ID: 0 Rank: 0

Please, if you could help me, I would be eternally gratetul.

Thanks again.

Can you post the output of onehost show -j <host_id> for host 14 and 15 ? Note that FREE_CPU is not the allocated CPU (CPU_USAGE) that is shown on that output. Check this for reference regarding the values of the host.

Hi (and thanks!)

I attach you two files with the “onehost show -j ID” from my servers #14 and #15.
14.txt (8.5 KB)
15.txt (13.0 KB)

Thanks a lot!!!

I understand that “FREE_CPU” is a value that is calculated about how many CPUs are free/used in a server, doesn’t it? So if a server has 10400 with 9000 assigned and the other server has 6400 with 0 assigned, new VMs must be instantiated in the server that has 0 assigned (because it has 6400 free and the other has only 1400).

Is it right?

FREE_CPU
Percentage of idling CPU multiplied by the number of cores. For example, if 50% of the CPU is idling in a 4 core machine the value will be 200.

It is a monitoring metric, not a metric defined by placement of VMs requiring a certain amount of CPU.

What you want, is probably to use a different policy

USED_CPU Percentage of used CPU multiplied by the number of cores. This value is displayed as USED CPU (REAL) by the onehost show command under HOST SHARE section.

or

CPU_USAGE Total CPU allocated to VMs running on the Host as requested in CPU in each VM template. This value is displayed as USED CPU (ALLOCATED) by the onehost show command under HOST SHARE section.

In the files you sent there is

  "ID": "15",
  "CAPACITY": {
    "FREE_CPU": "9672",

and

  "ID": "14",
  "CAPACITY": {
    "FREE_CPU": "6336",

so it correlates with VMs being placed on host 15. Because it has more FREE CPU score.

For the automatic requirements “issue”, check this.

Hi,

but, how can I use “CPU_USAGE” or “USED_CPU” in this menu?

Directly? If I select “Load-aware”, can I put “-CPU_USAGE” or “-USED_CPU”? Or how does I configure that policies?

I though that FREE_CPU evaluated how many CPUs in a server (host) were idle (free) in each monitorization interval, so if OpenNebula monitor my hosts each 300 seconds, VMs instantiated in second 3 and in second 315 could be placed in different server if FREE_CPU had been modified (from greater to lower, for example). If I have understood this in the bad way, I would need to know how apply a scheduling policy to get this behavior: always (or in each monitorization interval) select the host with more CPU free (idle)

Thanks.

Hi again (ufff, sorry :frowning: ),

I have these both commands:

[root@nebula log]# onehost show -j 14 | egrep 'CPU_USAGE|FREE_CPU|USED_CPU'
      "CPU_USAGE": "0",
        "FREE_CPU": "6400",
        "USED_CPU": "0",
[root@nebula log]# onehost show -j 15 | egrep 'CPU_USAGE|FREE_CPU|USED_CPU'
      "CPU_USAGE": "5300",
        "FREE_CPU": "10088",
        "USED_CPU": "312",

and in Dashboard I could see it:

So, now, I “understand” something more… but… with this behavior, if I use “FREE_CPU”, I suppose that next VM must be allocated at host #14, isn’t it?

Please, confirm me that I’m in the right way and, if you could, explain me if I can use directly other values than “FREE_CPU” (for example, “-USED_CPU” or “-CPU_USAGE”.

Again, thanks a lot!

P.S. I’m very worried with this problem (or bad configuration) because my small OpenNebula cluster is destinated to University students…

Hi,

I have run some test, checking with “watch” how values CPU_USAGE, FREE_CPU and USED_CPU and I have found that change policy from “FREE_CPU” to “-CPU_USAGE” is the solution for my OpenNebula scenario.

Thank a lot!!!

Glad you were able to sort it out. The attribute names can be confusing for sure due to the many ways the CPU is used (allocation, load, overcommitment, counterparts ) but at the end of the day is choosing a scheduling policy based on a quantifiable metric provided by the host template.

CPU_USAGE is Allocated CPU in Sunstone. Which is the result of adding up VM CPU reservation on a host.

1 Like

Today, my small cluster has arrived to run 350 VMs at the same time… and all in the same server because FREE_CPU is not working fine. I have rechecked all configuration, logs and so on, but I haven’t find the problem.

I understand that “FREE_CPU” is a value that is calculated about how many CPUs are free/used in a server, doesn’t it?

What I have learnt about this configuration is that FREE_CPU evaluates how many CPUs are idle, that is a different concept that how many CPUs are already assigned to VMs (this value is “CPU_USAGE”).