System datastore show 100% full

Hello colleagues,

We have a cluster with 4 KVM nodes and 1 Sunstone server. 2 iSCSI LUNs (multipath) are presented to the kvm nodes and VG and LV have been created, but when looking at the GUI, SYSTEM datastore appears with 100% used

. The datstores configurations is as follows:

[root@opennebula-cp ~]# onedatastore show 110
DATASTORE 110 INFORMATION
ID : 110
NAME : lvm_images
USER : oneadmin
GROUP : oneadmin
CLUSTERS : 0
TYPE : IMAGE
DS_MAD : fs
TM_MAD : fs_lvm
BASE PATH : /var/lib/one/datastores/110
DISK_TYPE : BLOCK
STATE : READY

DATASTORE CAPACITY
TOTAL: : 492G
FREE: : 466.9G
USED: : 128M
LIMIT: : -

PERMISSIONS
OWNER : um-
GROUP : u–
OTHER : —

DATASTORE TEMPLATE
ALLOW_ORPHANS=“NO”
CLONE_TARGET=“SYSTEM”
DISK_TYPE=“BLOCK”
DRIVER=“raw”
DS_MAD=“fs”
LN_TARGET=“SYSTEM”
SAFE_DIRS="/var/tmp /tmp"
TM_MAD=“fs_lvm”
TYPE=“IMAGE_DS”

IMAGES
[root@opennebula-cp ~]# onedatastore show 111
DATASTORE 111 INFORMATION
ID : 111
NAME : lvm_system
USER : oneadmin
GROUP : oneadmin
CLUSTERS : 0
TYPE : SYSTEM
DS_MAD : -
TM_MAD : fs_lvm
BASE PATH : /var/lib/one/datastores/111
DISK_TYPE : FILE
STATE : READY

DATASTORE CAPACITY
TOTAL: : 500G
FREE: : 0M
USED: : 500G
LIMIT: : -

PERMISSIONS
OWNER : um-
GROUP : u–
OTHER : —

DATASTORE TEMPLATE
ALLOW_ORPHANS=“NO”
BRIDGE_LIST=“on-hvsc01 on-hvsc02 on-hvsc03 on-hvsc04”
CLUSTER=“0”
DISK_TYPE=“FILE”
DS_MIGRATE=“YES”
RESTRICTED_DIRS="/"
SAFE_DIRS="/var/tmp"
SHARED=“YES”
TM_MAD=“fs_lvm”
TYPE=“SYSTEM_DS”

What could be happening?

Centos 7
OpenNebula 5.8.5

Regards,
Eduardo Rivera.

OpenNebula monitors usage of the LVM datastore using /var/lib/one/remotes/tm/fs_lvm/monitor which basicaly just ssh to some node from the BRIDGE_LIST and runs following command (as oneadmin via sudo)
vgdisplay --separator : --units m -o vg_size,vg_free --nosuffix --noheadings -C vg-one-111

Eventually it fallbacks to df command if vgdisplay is not available.

Anyway, can you check the monitor script and spot what could be wrong?

I have similar issue where BRIDGE_LIST has 3 nodes (2x2TB and 1x4TB), all with vg-one-XXX volume group. Sometimes the capacity of lvm_system is showing 2TB sometimes 4TB (like the host to be monitored chosen randomly?). It should show 8TB of total space, shouldn’t it? This is version 5.8.1
If I knew with what parameters to call /var/lib/one/remotes/tm/fs_lvm/monitor I could confirm further how it works.

EDIT: I managed to find the reason.
monitor uses get_destination_host function from /var/lib/one/remotes/datastore/libfs.sh
that randomly chooses host from BRIDGE_LIST so the lvm_system capacity is shown from a random node.

Capacity should be total from all nodes that offer vg-one-XXX volume group. Is it a bug?