Template with "FREE_CPU" policy

Daniel_Ruiz_Molina · July 22, 2022, 8:09am

Hello,

I’m testing a template with “FREE_CPU” placement policy (Host rank policy “load-aware”), because my cluster has two servers (64+104 cores) and I would like that users not have to worry about selecting one or other server, but scheduler choose the server with more free CPUs.

With this configuration, first 40 instantianted VMs should go to second server because this one has 40 CPUs more. But, when the number is similar in both servers, I would like that each instantiation goes to each server, one for first server, one for second server and so on. However, I have read that “FREE_CPU” is updated in each monitorization interval (cycle) (Virtual Machine Template — OpenNebula 6.4.0 documentation), so if I instantiate 20 VMs at once, probably these 20 VMs will not go 10 to first server and 10 to second server, but they will go, maybe, to the same server because “FREE_CPU” is not being updated during instantiation of these 20 VMs.

Is there any configuration way for correcting this behaviour? I suppose I can modify monitorization intervals but, what will be the correct values to allow my desire behaviour?

In my OpenNebula cluster, it is normal that users instantiante a large number of VMs at the same time (they are students and, during class, they are running theirs VMs for academic purposes during teacher’s explanation). So, for example, en 5 minutes, users hay have instantiated 30 VMs.

My monitor.conf file contains:

PROBES_PERIOD = [
    BEACON_HOST    = 30,
    SYSTEM_HOST    = 600,
    MONITOR_HOST   = 120,
    STATE_VM       = 5,
    MONITOR_VM     = 30,
    SYNC_STATE_VM  = 180
]

I have check in /var/log/one/monitor.log that each server is monitored each 120 seconds. If I change “MONITOR_HOST = 120” to “MONITOR_HOST = 30” (for example), could be a monitoring problem? With these reconfiguration, FREE_CPU will be update more quickly?

Thanks.

dclavijo · July 26, 2022, 9:19pm

MONITOR_HOST determines how often these scripts run. You will indeed get more updates on FREE_CPU among other values. However, consider also using CPU_USAGE for the placement policy, while not as accurate as FREE_CPU it isn’t tied to monitoring scripts since it just determines CPUs being used by VMs. More information about such values here.

Daniel_Ruiz_Molina · July 27, 2022, 10:07am

Hello,

But can I use these values in the policy placement scheduling options? How? If I select “Scheduling → Policy” (in VM Template) and select “Load-Aware”, it appears directly “FREE_CPU”. How can I select or configure with “CPU_USAGE”?

Thanks.

dclavijo · August 1, 2022, 5:15pm

You should be able to define a CUSTOM policy on sched.conf.

Daniel_Ruiz_Molina · October 10, 2022, 2:53pm

Hello @dclavijo @ahuertas @pczerny,

I have configured all my templates with “FREE_CPU”. Both servers have PRIORITY=100… but, I don’t know why, all my VMs are instantiated in my second server (numered as 15)… Why?

My servers are defined in the following way:

Server 1 (numbered as 14, that acts as server and KVM node):

Screenshot 2022-10-10 at 16-47-44 OpenNebula Sunstone Cloud Operations Center1006×1365 32.4 KB
Server 2 (numberes as 15, that acts only as KVM node):

Screenshot 2022-10-10 at 16-49-25 OpenNebula Sunstone Cloud Operations Center979×1424 31.9 KB

I don’t know why all instances are created in my second server (#15)… and it’s very important for that policy “FREE_CPU” runs OK, because in my OpenNebula sceneario, there are between 500-600 VMs, so with “FREE_CPU” I don’t worry about configuring each template to deploy in first or second server.

Thanks a lot!

dclavijo · October 11, 2022, 1:14pm

You can take a look at the decision process of the scheduler by increasing the debug level. Edit the /etc/one/sched.conf configuration and restart the opennebula-scheduler service. Then during the instantiation/scheduling process take a look at the scheduler log at /var/log/one/sched.log.

Daniel_Ruiz_Molina · October 13, 2022, 8:49am

Hello @dclavijo,

I have added more debug to sched (DEBUG_LEVEL = 5). I have got more messages and info, but I don’t know why it is happening. I explain all here:

My sched.conf is:

MESSAGE_SIZE = 1073741824
TIMEOUT      = 60

ONE_XMLRPC = "http://localhost:2633/RPC2"

SCHED_INTERVAL = 30

MAX_VM       = 50
MAX_DISPATCH = 30
MAX_HOST     = 20

LIVE_RESCHEDS  = 0
COLD_MIGRATE_MODE = 0

MEMORY_SYSTEM_DS_SCALE = 0

DIFFERENT_VNETS = YES

DEFAULT_SCHED = [
    policy = 1
]

DEFAULT_DS_SCHED = [
   policy = 3
]

DEFAULT_NIC_SCHED = [
   policy = 1
]

LOG = [
  SYSTEM      = "file",
  DEBUG_LEVEL = 5
]

I have instantiate some VMs with this template:

CONTEXT = [
  NETWORK = "YES",
  PASSWORD = "5yoDK8jK3gKWuCS0RFtsNQ==",
  SSH_PUBLIC_KEY = "$USER[SSH_PUBLIC_KEY]",
  START_SCRIPT_BASE64 = "L2Jpbi9lY2hvICJhZG1pbnA6TmVidWxhQ2FvcyIgfCBjaHBhc3N3ZA==" ]
CPU = "0.5"
CPU_COST = "0.01"
CPU_MODEL = [
  MODEL = "host-passthrough" ]
DISK = [
  IMAGE = "Ubuntu22.04v1.4-disk0",
  IMAGE_UNAME = "rsuppi" ]
DISK_COST = "7.5e-7"
FEATURES = [
  ACPI = "yes" ]
GRAPHICS = [
  KEYMAP = "es",
  LISTEN = "0.0.0.0",
  TYPE = "VNC" ]
HOT_RESIZE = [
  CPU_HOT_ADD_ENABLED = "NO",
  MEMORY_HOT_ADD_ENABLED = "NO" ]
HYPERVISOR = "kvm"
INPUTS_ORDER = ""
LOGO = "images/logos/ubuntu.png"
MEMORY = "2048"
MEMORY_COST = "0.0000146484375"
MEMORY_UNIT_COST = "GB"
NIC = [
  NETWORK_ID = "0" ]
OS = [
  BOOT = "disk0",
  FIRMWARE = "",
  FIRMWARE_SECURE = "YES" ]
SCHED_RANK = "FREE_CPU"
USER_INPUTS = [
  CPU = "O|fixed|| |0.5",
  MEMORY = "M|range||2048..4096|2048" ]
VCPU = "1"

However, when I instantiate it, logs on “sched.log” report this information:

Thu Oct 13 10:43:35 2022 [Z0][SCHED][D]: Getting VM and Host information. Total time: 0.97s
Thu Oct 13 10:43:35 2022 [Z0][VMGRP][D]: VM Group Scheduling information

Thu Oct 13 10:43:35 2022 [Z0][SCHED][D]: Setting VM groups placement constraints. Total time: 0.00s
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : FREE_CPU
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 14 Rank: 6336
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 15 Rank: 8736
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : PRIORITY
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 0 Rank: 0
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : FREE_CPU
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 14 Rank: 6336
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 15 Rank: 8736
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : PRIORITY
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 0 Rank: 0
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : FREE_CPU
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 14 Rank: 6336
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 15 Rank: 8736
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : PRIORITY
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 0 Rank: 0
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : FREE_CPU
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 14 Rank: 6336
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 15 Rank: 8736
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : PRIORITY
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 0 Rank: 0
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : FREE_CPU
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 14 Rank: 6336
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 15 Rank: 8736
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : PRIORITY
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 0 Rank: 0
Thu Oct 13 10:43:35 2022 [Z0][SCHED][D]: Match Making statistics:

In this information, you can see that:

Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: Rank evaluation for expression : FREE_CPU
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 14 Rank: 6336
Thu Oct 13 10:43:35 2022 [Z0][RANK][D]: ID: 15 Rank: 8736

but my server “ID 14” has all its 6400 “cores” free and my server “ID 15” has 8975 “cores” assigned… so… I don’t understand anything.

Server 14 has a total of 6400 “cpus”. Server 15 has a total of 10400 “cpus”. When I have instantiate my VM, server 14 had 6400 free cpus and server 15 had 10400-8975=1425… and 1425<6400, so I supposed that all my VM must be instantiated in “Server 14” that had more FREE_CPU free…

Please, help.

Thanks!

Daniel_Ruiz_Molina · October 13, 2022, 11:04am

Hi again, @dclavijo,

Now, my server “ID 15” is very full of VMs… and server “ID 14” is empty… and all news VMs are being instantiated in server “ID 15”.

I don’t understand anything.

Thanks.

Daniel_Ruiz_Molina · October 13, 2022, 1:39pm

Hi again, @dclavijo (and @ahuertas and @pczerny)

Today, my small cluster has arrived to run 350 VMs at the same time… and all in the same server because FREE_CPU is not working fine. I have rechecked all configuration, logs and so on, but I haven’t find the problem.

Now, I have found that in a running VM, into “Template” tab, I can see this:

User template
HOT_RESIZE = [
  CPU_HOT_ADD_ENABLED = "NO",
  MEMORY_HOT_ADD_ENABLED = "NO" ]
HYPERVISOR = "kvm"
INPUTS_ORDER = ""
LOGO = "images/logos/ubuntu.png"
MEMORY_UNIT_COST = "MB"
SCHED_RANK = "FREE_CPU"
SUNSTONE = [
  NETWORK_SELECT = "NO" ]
USER_INPUTS = [
  CPU = "O|fixed|| |2",
  MEMORY = "O|fixed|| |4096" ]
Template
AUTOMATIC_DS_REQUIREMENTS = "(\"CLUSTERS/ID\" @> 0)"
AUTOMATIC_NIC_REQUIREMENTS = "(\"CLUSTERS/ID\" @> 0)"
AUTOMATIC_REQUIREMENTS = "(CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED)"
CONTEXT = [
  DISK_ID = "1",
  ETH0_DNS = "8.8.8.8",
  ETH0_EXTERNAL = "",
  ETH0_GATEWAY = "10.10.10.1",
  ETH0_GATEWAY6 = "",
  ETH0_IP = "10.10.11.26",
  ETH0_IP6 = "",
  ETH0_IP6_GATEWAY = "",
  ETH0_IP6_METHOD = "",
  ETH0_IP6_METRIC = "",
  ETH0_IP6_PREFIX_LENGTH = "",
  ETH0_IP6_ULA = "",
  ETH0_MAC = "02:00:0a:0a:0b:1a",
  ETH0_MASK = "255.255.254.0",
  ETH0_METHOD = "",
  ETH0_METRIC = "",
  ETH0_MTU = "",
  ETH0_NETWORK = "10.10.10.0",
  ETH0_SEARCH_DOMAIN = "",
  ETH0_VLAN_ID = "",
  ETH0_VROUTER_IP = "",
  ETH0_VROUTER_IP6 = "",
  ETH0_VROUTER_MANAGEMENT = "",
  NETWORK = "YES",
  PASSWORD = "5yoDK8jK3gKWuCS0RFtsNQ==",
  SSH_PUBLIC_KEY = "",
  TARGET = "hda" ]
CPU = "2"
DISK = [
  ALLOW_ORPHANS = "FORMAT",
  CLONE = "YES",
  CLONE_TARGET = "SYSTEM",
  CLUSTER_ID = "0",
  DATASTORE = "default",
  DATASTORE_ID = "1",
  DEV_PREFIX = "sd",
  DISK_ID = "0",
  DISK_SNAPSHOT_TOTAL_SIZE = "0",
  DISK_TYPE = "FILE",
  DRIVER = "qcow2",
  FORMAT = "qcow2",
  IMAGE = "Ubuntu-20.04-Spark-Jupyter",
  IMAGE_ID = "273",
  IMAGE_STATE = "2",
  IMAGE_UNAME = "oneadmin",
  LN_TARGET = "NONE",
  ORIGINAL_SIZE = "20480",
  READONLY = "NO",
  SAVE = "NO",
  SIZE = "20480",
  SOURCE = "/var/lib/one//datastores/1/e8073ad219b773b47538921f94cbb6e0",
  TARGET = "sda",
  TM_MAD = "qcow2",
  TYPE = "FILE" ]
FEATURES = [
  ACPI = "yes" ]
GRAPHICS = [
  KEYMAP = "es",
  LISTEN = "0.0.0.0",
  PORT = "16881",
  TYPE = "VNC" ]
MEMORY = "4096"
NIC = [
  AR_ID = "0",
  BRIDGE = "br1",
  BRIDGE_TYPE = "linux",
  CLUSTER_ID = "0",
  GATEWAY = "10.10.10.1",
  IP = "10.10.11.26",
  MAC = "02:00:0a:0a:0b:1a",
  NAME = "NIC0",
  NETWORK = "Internet",
  NETWORK_ID = "0",
  NETWORK_UNAME = "oneadmin",
  NIC_ID = "0",
  SECURITY_GROUPS = "0",
  TARGET = "one-10981-0",
  VN_MAD = "fw" ]
OS = [
  FIRMWARE = "",
  FIRMWARE_SECURE = "YES",
  UUID = "55ec0f47-4dad-47ce-8c2e-53e853d26ae1" ]
SECURITY_GROUP_RULE = [
  PROTOCOL = "ALL",
  RULE_TYPE = "OUTBOUND",
  SECURITY_GROUP_ID = "0",
  SECURITY_GROUP_NAME = "default" ]
SECURITY_GROUP_RULE = [
  PROTOCOL = "ALL",
  RULE_TYPE = "INBOUND",
  SECURITY_GROUP_ID = "0",
  SECURITY_GROUP_NAME = "default" ]
TEMPLATE_ID = "333"
TM_MAD_SYSTEM = "qcow2"
VCPU = "2"
VMID = "10981"

How can I delete the following lines? There aren’t defined in the template… but… they appears into “User template” from the running VM:

AUTOMATIC_DS_REQUIREMENTS = "(\"CLUSTERS/ID\" @> 0)"
AUTOMATIC_NIC_REQUIREMENTS = "(\"CLUSTERS/ID\" @> 0)"
AUTOMATIC_REQUIREMENTS = "(CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED)"

I have got troubles with “AUTOMATIC_REQUIREMENTS” because in some templates, the value of this variable was “(CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( ID=“15” )” so in that cases, VMs seemed to be forced to be instantiated in “ID=15” because server “ID 14”, obviouly, is not 15, but 14… For example, in this case:

Thu Oct 13 09:23:44 2022 [Z0][SCHED][D]: Setting VM groups placement constraints. Total time: 0.00s
Thu Oct 13 09:23:44 2022 [Z0][SCHED][D]: Host 14 discarded for VM 10956. It does not fulfill SCHED_REQUIREMENTS: (CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES) & !(PIN_POLICY = PINNED) & ( ID="15" )
Thu Oct 13 09:23:44 2022 [Z0][RANK][D]: Rank evaluation for expression : - RUNNING_VMS
Thu Oct 13 09:23:44 2022 [Z0][RANK][D]: ID: 15 Rank: -6
Thu Oct 13 09:23:44 2022 [Z0][RANK][D]: Rank evaluation for expression : PRIORITY
Thu Oct 13 09:23:44 2022 [Z0][RANK][D]: ID: 0 Rank: 0

Please, if you could help me, I would be eternally gratetul.

Thanks again.

dclavijo · October 13, 2022, 1:43pm

Can you post the output of onehost show -j <host_id> for host 14 and 15 ? Note that FREE_CPU is not the allocated CPU (CPU_USAGE) that is shown on that output. Check this for reference regarding the values of the host.

Daniel_Ruiz_Molina · October 13, 2022, 1:50pm

Hi (and thanks!)

I attach you two files with the “onehost show -j ID” from my servers #14 and #15.
14.txt (8.5 KB)
15.txt (13.0 KB)

Thanks a lot!!!

Daniel_Ruiz_Molina · October 13, 2022, 1:52pm

I understand that “FREE_CPU” is a value that is calculated about how many CPUs are free/used in a server, doesn’t it? So if a server has 10400 with 9000 assigned and the other server has 6400 with 0 assigned, new VMs must be instantiated in the server that has 0 assigned (because it has 6400 free and the other has only 1400).

Is it right?

dclavijo · October 13, 2022, 2:01pm

FREE_CPU
Percentage of idling CPU multiplied by the number of cores. For example, if 50% of the CPU is idling in a 4 core machine the value will be 200.

It is a monitoring metric, not a metric defined by placement of VMs requiring a certain amount of CPU.

What you want, is probably to use a different policy

USED_CPU Percentage of used CPU multiplied by the number of cores. This value is displayed as USED CPU (REAL) by the onehost show command under HOST SHARE section.

or

CPU_USAGE Total CPU allocated to VMs running on the Host as requested in CPU in each VM template. This value is displayed as USED CPU (ALLOCATED) by the onehost show command under HOST SHARE section.

In the files you sent there is

  "ID": "15",

  "CAPACITY": {
    "FREE_CPU": "9672",

and

  "ID": "14",

  "CAPACITY": {
    "FREE_CPU": "6336",

so it correlates with VMs being placed on host 15. Because it has more FREE CPU score.

For the automatic requirements “issue”, check this.

Daniel_Ruiz_Molina · October 13, 2022, 3:46pm

Hi,

but, how can I use “CPU_USAGE” or “USED_CPU” in this menu?

Directly? If I select “Load-aware”, can I put “-CPU_USAGE” or “-USED_CPU”? Or how does I configure that policies?

I though that FREE_CPU evaluated how many CPUs in a server (host) were idle (free) in each monitorization interval, so if OpenNebula monitor my hosts each 300 seconds, VMs instantiated in second 3 and in second 315 could be placed in different server if FREE_CPU had been modified (from greater to lower, for example). If I have understood this in the bad way, I would need to know how apply a scheduling policy to get this behavior: always (or in each monitorization interval) select the host with more CPU free (idle)

Thanks.

Daniel_Ruiz_Molina · October 13, 2022, 8:05pm

Hi again (ufff, sorry ),

I have these both commands:

[root@nebula log]# onehost show -j 14 | egrep 'CPU_USAGE|FREE_CPU|USED_CPU'
      "CPU_USAGE": "0",
        "FREE_CPU": "6400",
        "USED_CPU": "0",
[root@nebula log]# onehost show -j 15 | egrep 'CPU_USAGE|FREE_CPU|USED_CPU'
      "CPU_USAGE": "5300",
        "FREE_CPU": "10088",
        "USED_CPU": "312",

and in Dashboard I could see it:

So, now, I “understand” something more… but… with this behavior, if I use “FREE_CPU”, I suppose that next VM must be allocated at host #14, isn’t it?

Please, confirm me that I’m in the right way and, if you could, explain me if I can use directly other values than “FREE_CPU” (for example, “-USED_CPU” or “-CPU_USAGE”.

Again, thanks a lot!

P.S. I’m very worried with this problem (or bad configuration) because my small OpenNebula cluster is destinated to University students…

Daniel_Ruiz_Molina · October 14, 2022, 10:10am

Hi,

I have run some test, checking with “watch” how values CPU_USAGE, FREE_CPU and USED_CPU and I have found that change policy from “FREE_CPU” to “-CPU_USAGE” is the solution for my OpenNebula scenario.

Thank a lot!!!

dclavijo · October 14, 2022, 2:35pm

Glad you were able to sort it out. The attribute names can be confusing for sure due to the many ways the CPU is used (allocation, load, overcommitment, counterparts ) but at the end of the day is choosing a scheduling policy based on a quantifiable metric provided by the host template.

CPU_USAGE is Allocated CPU in Sunstone. Which is the result of adding up VM CPU reservation on a host.

khaanekii · October 28, 2022, 9:45pm

Today, my small cluster has arrived to run 350 VMs at the same time… and all in the same server because FREE_CPU is not working fine. I have rechecked all configuration, logs and so on, but I haven’t find the problem.

auratkachakkar · November 13, 2022, 11:24am

I understand that “FREE_CPU” is a value that is calculated about how many CPUs are free/used in a server, doesn’t it?

Daniel_Ruiz_Molina · November 14, 2022, 8:54am

What I have learnt about this configuration is that FREE_CPU evaluates how many CPUs are idle, that is a different concept that how many CPUs are already assigned to VMs (this value is “CPU_USAGE”).

Topic		Replies	Views
CPU quotas and stopped/undeployed VMs [Solved] Development	5	1632	May 2, 2016
Number of CPU's different in VM than configured Community Support	2	859	March 9, 2020
OpenNebula serially instantiates VMs Community Support	3	373	July 2, 2019
Remove VMs from Opennebula Dashboard without deleting from vCenter Community Support	4	996	December 20, 2016
CPU vs vCPU in VM template Community Support	3	8471	March 17, 2016

Template with "FREE_CPU" policy

Related topics