Memory resize bug- VM starts with max memory and decreases until OOM

This is an odd one. I’m currently running a PoC with OpenNebula 6.4 and HCI, using KVM.

When I provision a KVM VM using a marketplace template (i’ve tried CentOS 7 and Debian 10, same on both) I get strange behaviour with the “hot resize” ram option enabled.

Upon instantiating a VM, it starts with the max memory (so 128GB in my case) and rapidly decreases until it hits zero and the VM crashes because it has no RAM left. I have sensible defaults for my VM and host settings (max memory etc) and with the hot resize option disabled, this behaviour does not occur.

Rebooting the VM puts me back at square 1 with max RAM, and the problem starts all over again.

Relevant extract from VM config:

HOT_RESIZE = [
  CPU_HOT_ADD_ENABLED = "YES",
  MEMORY_HOT_ADD_ENABLED = "YES" ]
HYPERVISOR = "kvm"
INPUTS_ORDER = ""
LOGO = "images/logos/debian.png"
LXD_SECURITY_PRIVILEGED = "true"
MEMORY_UNIT_COST = "MB"
USER_INPUTS = [
  MEMORY = "M|range||1024..128000|768" ]
...
MEMORY = "1024"
MEMORY_MAX = "131072"

Output of free -g whilst this is happening:

root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:             24           0          23           0           0          23
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:             22           0          22           0           0          21
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:             21           0          21           0           0          20
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:             20           0          20           0           0          19
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:             19           0          19           0           0          18
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:             18           0          17           0           0          17
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:             16           0          16           0           0          15
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:             15           0          15           0           0          14
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:             14           0          13           0           0          13
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:             12           0          12           0           0          11
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:             11           0          11           0           0          10
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:             10           0          10           0           0           9
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:              9           0           8           0           0           8
Swap:             0           0           0
root@localhost:~# free -g
              total        used        free      shared  buff/cache   available
Mem:              7           0           7           0           0           6
Swap:             0           0           0
root@localhost:~# free -g
Connection to x.x.x.x closed by remote host.
Connection to x.x.x.x closed.

Hello, I think this bug is related to the hypervisor and/or guest OS. Opennebula just utilizes standard qemu-kvm features. What type of OS and version you are using on the hypervisor?

It’s KVM on Ubuntu 20.04.

[updated with more info] As an update to this, it looks like this could be a calculation error?

If I create a 32GB RAM VM for example, it sizes from 137GB RAM (the max) down to just over 30GB:

root@localhost:~# free -m
              total        used        free      shared  buff/cache   available
Mem:          30621         217       30296           8         107       29515
Swap:             0           0           0

And remains stable there.

This makes it impossible to create VMs below ~1GB or so since they’ll be incorrectly sized and OOM (all my previous tests did exactly that).

Relevant config from my 1GB tests:

cl02~ # onevm show debtest | grep -i memory
MEMORY              : 3.1G
  MEMORY_HOT_ADD_ENABLED="YES" ]
MEMORY_UNIT_COST="MB"
  MEMORY="M|range||1024..122880|1024" ]
MEMORY="1024"
MEMORY_MAX="131072"

poc2~ # virsh dominfo one-22 | grep memory
Max memory:     134217728 KiB
Used memory:    2970304 KiB

poc2~ # virsh dumpxml one-22 | grep -i mem
  <memory unit='KiB'>134217728</memory>
  <currentMemory unit='KiB'>2970304</currentMemory>

134217728 KiB = 137GB.

Just to close the loop on this- as part of our PoC we tore down the cluster and reprovisioned each node, and the problem went away. No idea why.

I did the following

Logged into the machine where opennebula is installed, got root rights.

I got the list of vm with the command:

onevm list

I remembered the ID of the desired machine. Then removed the MEMORY_MAX parameter from the VM template

onedb change-body vm --id=22 ‘/VM/TEMPLATE/MEMORY_MAX’ --delete

And finally removed the HOT_RESIZE block from the user template:

onevm update 22

removed this block

HOT_RESIZE=[
CPU_HOT_ADD_ENABLED=“YES”,
MEMORY_HOT_ADD_ENABLED=“YES” ]

Then I checked the template again and started the VM.

@MDeblin the problem is we want to be able to hot resize nodes, including memory, If we remove those lines from the template, we won’t be able to.

I understand.
But in the case of the enabled option MEMORY_HOT_ADD_ENABLED=“YES”
in my case (Debian 11 with Opennebula 6.4)
the MEMORY_MAX variable is substituted with the size of the host’s RAM.
I got around this (hopefully temporarily) by disabling MEMORY_MAX
and MEMORY_HOT_ADD_ENABLED=“YES”

This is exactly what we’re seeing. Also on KVM on Ubuntu 20.04. Did you by any chance upgrade Ubuntu to this version or was this a fresh install?