Vm_pool and host_pool table out of sync resulting in error: Requested operation is not valid: (GPU) PCI device in use by driver qemu

Opennebula 5.2.1
using PCI GPU passthrough.
There are 4 GPU cards in each (GPU) host.

ONE is not aware that the PCI device is already in use by a VM on the particular compute node, so the schedulere will try to deploy the VM and configure it to use the PCI device and will keep on trying this. As a work-around I disabled the node so it will not be used to deploy new VM’s on it.

Is there a way to “tell” ONE that this PCI device is in use and by what VM?

I am new to OpenNebula, and I could not find a way to update the information for a particular compute node (host). I also could not find relevant issues while searching the forum, documentation or other sources on the internet.

Thanks in advance,

Hans Feringa

Some additional information:

When retrieving information about the VM’s (onevm show vm-id) there is actually the information available that the PCI device is in use.
This is not shown in the host information (onehost show host-id). And this latter info seems to be used by the scheduler.
I have run onedb fsck recently, so I think it is an oversight of the onedb fsck procedure.

I also noticed that if ONE is not aware that the first address (or one of the first addresses) is in use, it will never try one of the other addresses that are available for the VM on that host. This VM is then in the failure state. This will then open the way for a next VM to be scheduled to use the next available (unused) resource on this particular host. We assumed that this was the case and was confirmed in our tests. The annoying thing is that a failed VM is never tried on another host, and is actually stuck on the node where ONE thinks that the allocated device/resource is still available.

The information for the host is in the table host_pool in field body. It shows that for the XML tag VMID has a value of ![CDATA[-1] while it should have (in this case) ![CDATA[25400].

In the body field of the vm_pool table (information for the VM), in the XML blob the information regarding the usage of the PCI device is present, with the correct address, bus etc info. So clearly this information between the two tables is out of sync.

Hi

I am using 5.2.1 with KVM and i have the exact opposite problem. In my case, the host once its assigned a VM, it runs normally

But when the VM is deleted, the PCI tab still shows the old VM ID and the scheduler is not able to assign any new VM on this

On the host pool table, instead of CDATA[-1], its being shown as CDATA[94576]

So this is pure stale data

I have even commented out the filters to read as

FILTER = “0:0” but still see the two old vms pointing to the Graphics cards on the PCI tab

VM PCI Address Type Name
16195 02:00.0 10de:1b06:0300 GP102 [GeForce GTX 1080 Ti]
16195 02:00.1 10de:10ef:0403 GP102 HDMI Audio Controller
16201 04:00.0 10de:1b06:0300 GP102 [GeForce GTX 1080 Ti]
16201 04:00.1 10de:10ef:0403 GP102 HDMI Audio Controller

Is there a way to flush the details of the host fully and have oned requery the host afresh and populate all data on it ?

Hi

More troubleshooting and here are the observations. I fired up a VM with ID 16211 and deleted it as well but i still see it mapped on PCI to the node on which it existed… PFA the outputs of onehost show and onevm show

[oneadmin@SPK-D-0262 ~]$ onehost show 505
HOST 505 INFORMATION
ID : 505
NAME : GPU-testing.spikecloud.net.in
CLUSTER : win-cluster
STATE : MONITORED
IM_MAD : kvm
VM_MAD : kvm
LAST MONITORING TIME : 04/15 19:21:20

HOST SHARES
TOTAL MEM : 31.4G
USED MEM (REAL) : 456.2M
USED MEM (ALLOCATED) : 0K
TOTAL CPU : 800
USED CPU (REAL) : 8
USED CPU (ALLOCATED) : 0
RUNNING VMS : 0

MONITORING INFORMATION
ARCH=“x86_64”
CPUSPEED=“927”
HOSTNAME=“GPU-testing.spikecloud.net.in”
HYPERVISOR=“kvm”
IM_MAD=“kvm”
MODELNAME=“Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz”
NETRX=“60471118048”
NETTX=“243708362”
RESERVED_CPU=“”
RESERVED_MEM=“”
VERSION=“5.2.0”
VM_MAD=“kvm”

PCI DEVICES

VM ADDR TYPE NAME
16211 01:00.0 10de:1c82:0300 GP107 [GeForce GTX 1050 Ti]
16211 01:00.1 10de:0fb9:0403 GP107GL High Definition Audio Controller

WILD VIRTUAL MACHINES

NAME IMPORT_ID CPU MEMORY

VIRTUAL MACHINES

ID USER     GROUP    NAME            STAT UCPU    UMEM HOST             TIME

[oneadmin@SPK-D-0262 ~]$ onevm show 16211
VIRTUAL MACHINE 16211 INFORMATION
ID : 16211
NAME : test-gpu
USER : oneadmin
GROUP : oneadmin
STATE : DONE
LCM_STATE : LCM_INIT
RESCHED : No
START TIME : 04/15 13:38:39
END TIME : 04/15 13:50:38
DEPLOY ID : one-16211

VIRTUAL MACHINE MONITORING
CPU : 0.0
MEMORY : 0K
NETTX : 0K
NETRX : 2.4M

PERMISSIONS
OWNER : um-
GROUP : —
OTHER : —

VM DISKS
ID DATASTORE TARGET IMAGE SIZE TYPE SAVE
0 KVM-datast hda GPU-image-KVM -/19.5G file NO
1 - hdb CONTEXT -/- - -

VM NICS
ID NETWORK BRIDGE IP MAC PCI_ID
0 Public-B10G private 101.53.136.158 02:00:65:35:88:9e
1 Private-B private 172.16.100.208 02:00:ac:10:64:d0

SECURITY

NIC_ID NETWORK SECURITY_GROUPS
0 Public-B10G 0
1 Private-B 0

SECURITY GROUP TYPE PROTOCOL NETWORK RANGE
ID NAME VNET START SIZE
0 default OUTBOUND ALL
0 default INBOUND ALL

VIRTUAL MACHINE HISTORY
SEQ HOST ACTION DS START TIME PROLOG
0 GPU-testing.spi poweroff-hard 115 - 17636d 08h1 0h00m00s
1 GPU-testing.spi terminate 115 04/15 13:45:30 0d 00h05m 0h00m00s

USER TEMPLATE
APPLIANCE=“”
DISTRO=“CentOS-7.4”
HYPERVISOR=“kvm”
LOGO=“images/logos/centos.png”
OS_TYPE=“CentOS-7.4”
RATE=“100000”
SCHED_MESSAGE=“”
SCHED_REQUIREMENTS=“”
SKU_TYPE=“VPS”
SUNSTONE=[
NETWORK_SELECT=“NO” ]
TYPE=“GPU”

VIRTUAL MACHINE TEMPLATE
AUTOMATIC_DS_REQUIREMENTS=“"CLUSTERS/ID" @> 104”
AUTOMATIC_REQUIREMENTS=“(CLUSTER_ID = 104) & !(PUBLIC_CLOUD = YES)”
CONTEXT=[
DISK_ID=“1”,
DNS_HOSTNAME=“YES”,
ETH0_CONTEXT_FORCE_IPV4=“”,
ETH0_DNS=“8.8.8.8 8.8.4.4”,
ETH0_GATEWAY=“101.53.136.1”,
ETH0_GATEWAY6=“”,
ETH0_IP=“101.53.136.158”,
ETH0_IP6=“”,
ETH0_IP6_ULA=“”,
ETH0_MAC=“02:00:65:35:88:9e”,
ETH0_MASK=“255.255.248.0”,
ETH0_MTU=“”,
ETH0_NETWORK=“101.53.136.0”,
ETH0_SEARCH_DOMAIN=“”,
ETH0_VLAN_ID=“651”,
ETH0_VROUTER_IP=“”,
ETH0_VROUTER_IP6=“”,
ETH0_VROUTER_MANAGEMENT=“”,
ETH1_CONTEXT_FORCE_IPV4=“”,
ETH1_DNS=“”,
ETH1_GATEWAY=“”,
ETH1_GATEWAY6=“”,
ETH1_IP=“172.16.100.208”,
ETH1_IP6=“”,
ETH1_IP6_ULA=“”,
ETH1_MAC=“02:00:ac:10:64:d0”,
ETH1_MASK=“255.255.224.0”,
ETH1_MTU=“”,
ETH1_NETWORK=“”,
ETH1_SEARCH_DOMAIN=“”,
ETH1_VLAN_ID=“”,
ETH1_VROUTER_IP=“”,
ETH1_VROUTER_IP6=“”,
ETH1_VROUTER_MANAGEMENT=“”,
NETWORK=“YES”,
ONEGATE_ENDPOINT=“”,
PCI0_ADDRESS=“01:01.0”,
PCI1_ADDRESS=“01:02.0”,
SSH_PUBLIC_KEY=“”,
TARGET=“hdb”,
TOKEN=“YES”,
VMID=“16211” ]
CPU=“1”
CREATED_BY=“0”
GRAPHICS=[
LISTEN=“0.0.0.0”,
PASSWD=“”,
PORT=“22111”,
RANDOM_PASSWD=“YES”,
TYPE=“VNC” ]
MEMORY=“10000”
OS=[
ARCH=“x86_64”,
BOOTLOADER=“pygrub”,
ROOT=“/dev/xvda1” ]
PCI=[
CLASS=“0300”,
PCI_ID=“0”,
VENDOR=“10de”,
VM_ADDRESS=“01:01.0”,
VM_BUS=“0x01”,
VM_DOMAIN=“0x0000”,
VM_FUNCTION=“0”,
VM_SLOT=“0x01” ]
PCI=[
CLASS=“0403”,
PCI_ID=“1”,
VENDOR=“10de”,
VM_ADDRESS=“01:02.0”,
VM_BUS=“0x01”,
VM_DOMAIN=“0x0000”,
VM_FUNCTION=“0”,
VM_SLOT=“0x02” ]
TEMPLATE_ID=“1270”
VCPU=“4”
VMID=“16211”

I then did a manual probe run for the host on which this incorrect mapping is there and here is the result

[root@GPU-testing im]# bash -x run_probes kvm /var/lib/one/datastores 4124 20 0 localhost
++ dirname run_probes

  • source ./…/scripts_common.sh
    ++ export LANG=C
    ++ LANG=C
    ++ export PATH=/bin:/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
    ++ PATH=/bin:/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
    ++ AWK=awk
    ++ BASH=/usr/bin/bash
    ++ CUT=cut
    ++ CEPH=ceph
    ++ DATE=date
    ++ DD=dd
    ++ DF=df
    ++ DU=du
    ++ GREP=grep
    ++ ISCSIADM=iscsiadm
    ++ LVCREATE=lvcreate
    ++ LVREMOVE=lvremove
    ++ LVCHANGE=lvchange
    ++ LVSCAN=lvscan
    ++ LVS=lvs
    ++ LN=ln
    ++ MD5SUM=md5sum
    ++ MKFS=mkfs
    ++ MKISOFS=genisoimage
    ++ MKSWAP=mkswap
    ++ QEMU_IMG=qemu-img
    ++ RADOS=rados
    ++ RBD=rbd
    ++ READLINK=readlink
    ++ RM=rm
    ++ CP=cp
    ++ SCP=scp
    ++ SED=sed
    ++ SSH=ssh
    ++ SUDO=sudo
    ++ SYNC=sync
    ++ TAR=tar
    ++ TGTADM=tgtadm
    ++ TGTADMIN=tgt-admin
    ++ TGTSETUPLUN=tgt-setup-lun-one
    ++ TR=tr
    ++ VGDISPLAY=vgdisplay
    ++ VMKFSTOOLS=vmkfstools
    ++ WGET=wget
    +++ uname -s
    ++ ‘[’ xLinux = xLinux ‘]’
    ++ SED=‘sed -r’
    +++ basename run_probes
    ++ SCRIPT_NAME=run_probes
  • export LANG=C
  • LANG=C
  • HYPERVISOR_DIR=kvm.d
  • ARGUMENTS=‘kvm /var/lib/one/datastores 4124 20 0 localhost’
    ++ dirname run_probes
  • SCRIPTS_DIR=.
  • cd .
    ++ ‘[’ -d kvm.d ‘]’
    ++ run_dir kvm.d
    ++ cd kvm.d
    +++ ls collectd-client.rb collectd-client_control.sh
    +++ grep -E -v ‘.(rpmnew|rpmsave|dpkg-\w+)$’
    ++ for i in ‘ls * | grep -E -v '\''\.(rpmnew|rpmsave|dpkg-\w+)$'\''
    ++ ‘[’ -x collectd-client.rb ‘]’
    ++ for i in ‘ls * | grep -E -v '\''\.(rpmnew|rpmsave|dpkg-\w+)$'\''
    ++ ‘[’ -x collectd-client_control.sh ‘]’
    ++ ./collectd-client_control.sh kvm /var/lib/one/datastores 4124 20 0 localhost
    ++ EXIT_CODE=0
    ++ ‘[’ x0 ‘!=’ x0 ‘]’
  • data=‘ARCH=x86_64
    MODELNAME=“Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz”
    HYPERVISOR=kvm
    TOTALCPU=800
    CPUSPEED=1227
    TOTALMEMORY=32963604
    USEDMEMORY=466716
    FREEMEMORY=32496888
    FREECPU=800
    USEDCPU=0
    NETRX=60462550424
    NETTX=243637205
    DS_LOCATION_USED_MB=347259
    DS_LOCATION_TOTAL_MB=449237
    DS_LOCATION_FREE_MB=79151
    DS = [
    ID = 0,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 1,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 103,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 108,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 114,
    USED_MB = 1818743,
    TOTAL_MB = 3603894,
    FREE_MB = 1602077
    ]
    DS = [
    ID = 115,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 119,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 2,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    HOSTNAME=GPU-testing.spikecloud.net.in
    PCI = [
    TYPE = “10de:1c82:0300”,
    VENDOR = “10de”,
    VENDOR_NAME = “NVIDIA Corporation”,
    DEVICE = “1c82”,
    DEVICE_NAME = “GP107 [GeForce GTX 1050 Ti]”,
    CLASS = “0300”,
    CLASS_NAME = “VGA compatible controller”,
    ADDRESS = “0000:01:00:0”,
    SHORT_ADDRESS = “01:00.0”,
    DOMAIN = “0000”,
    BUS = “01”,
    SLOT = “00”,
    FUNCTION = “0”
    ]
    PCI = [
    TYPE = “10de:0fb9:0403”,
    VENDOR = “10de”,
    VENDOR_NAME = “NVIDIA Corporation”,
    DEVICE = “0fb9”,
    DEVICE_NAME = “GP107GL High Definition Audio Controller”,
    CLASS = “0403”,
    CLASS_NAME = “Audio device”,
    ADDRESS = “0000:01:00:1”,
    SHORT_ADDRESS = “01:00.1”,
    DOMAIN = “0000”,
    BUS = “01”,
    SLOT = “00”,
    FUNCTION = “1”
    ]
    VM_POLL=YES
    VERSION=“5.2.0”’
  • EXIT_CODE=0
  • echo ‘ARCH=x86_64
    MODELNAME=“Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz”
    HYPERVISOR=kvm
    TOTALCPU=800
    CPUSPEED=1227
    TOTALMEMORY=32963604
    USEDMEMORY=466716
    FREEMEMORY=32496888
    FREECPU=800
    USEDCPU=0
    NETRX=60462550424
    NETTX=243637205
    DS_LOCATION_USED_MB=347259
    DS_LOCATION_TOTAL_MB=449237
    DS_LOCATION_FREE_MB=79151
    DS = [
    ID = 0,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 1,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 103,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 108,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 114,
    USED_MB = 1818743,
    TOTAL_MB = 3603894,
    FREE_MB = 1602077
    ]
    DS = [
    ID = 115,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 119,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 2,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    HOSTNAME=GPU-testing.spikecloud.net.in
    PCI = [
    TYPE = “10de:1c82:0300”,
    VENDOR = “10de”,
    VENDOR_NAME = “NVIDIA Corporation”,
    DEVICE = “1c82”,
    DEVICE_NAME = “GP107 [GeForce GTX 1050 Ti]”,
    CLASS = “0300”,
    CLASS_NAME = “VGA compatible controller”,
    ADDRESS = “0000:01:00:0”,
    SHORT_ADDRESS = “01:00.0”,
    DOMAIN = “0000”,
    BUS = “01”,
    SLOT = “00”,
    FUNCTION = “0”
    ]
    PCI = [
    TYPE = “10de:0fb9:0403”,
    VENDOR = “10de”,
    VENDOR_NAME = “NVIDIA Corporation”,
    DEVICE = “0fb9”,
    DEVICE_NAME = “GP107GL High Definition Audio Controller”,
    CLASS = “0403”,
    CLASS_NAME = “Audio device”,
    ADDRESS = “0000:01:00:1”,
    SHORT_ADDRESS = “01:00.1”,
    DOMAIN = “0000”,
    BUS = “01”,
    SLOT = “00”,
    FUNCTION = “1”
    ]
    VM_POLL=YES
    VERSION=“5.2.0”’
    ARCH=x86_64
    MODELNAME=“Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz”
    HYPERVISOR=kvm
    TOTALCPU=800
    CPUSPEED=1227
    TOTALMEMORY=32963604
    USEDMEMORY=466716
    FREEMEMORY=32496888
    FREECPU=800
    USEDCPU=0
    NETRX=60462550424
    NETTX=243637205
    DS_LOCATION_USED_MB=347259
    DS_LOCATION_TOTAL_MB=449237
    DS_LOCATION_FREE_MB=79151
    DS = [
    ID = 0,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 1,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 103,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 108,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 114,
    USED_MB = 1818743,
    TOTAL_MB = 3603894,
    FREE_MB = 1602077
    ]
    DS = [
    ID = 115,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 119,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    DS = [
    ID = 2,
    USED_MB = 347259,
    TOTAL_MB = 449237,
    FREE_MB = 79151
    ]
    HOSTNAME=GPU-testing.spikecloud.net.in
    PCI = [
    TYPE = “10de:1c82:0300”,
    VENDOR = “10de”,
    VENDOR_NAME = “NVIDIA Corporation”,
    DEVICE = “1c82”,
    DEVICE_NAME = “GP107 [GeForce GTX 1050 Ti]”,
    CLASS = “0300”,
    CLASS_NAME = “VGA compatible controller”,
    ADDRESS = “0000:01:00:0”,
    SHORT_ADDRESS = “01:00.0”,
    DOMAIN = “0000”,
    BUS = “01”,
    SLOT = “00”,
    FUNCTION = “0”
    ]
    PCI = [
    TYPE = “10de:0fb9:0403”,
    VENDOR = “10de”,
    VENDOR_NAME = “NVIDIA Corporation”,
    DEVICE = “0fb9”,
    DEVICE_NAME = “GP107GL High Definition Audio Controller”,
    CLASS = “0403”,
    CLASS_NAME = “Audio device”,
    ADDRESS = “0000:01:00:1”,
    SHORT_ADDRESS = “01:00.1”,
    DOMAIN = “0000”,
    BUS = “01”,
    SLOT = “00”,
    FUNCTION = “1”
    ]
    VM_POLL=YES
    VERSION=“5.2.0”
  • exit 0

Any pointers to fix this ?

Ran a onedb fsck as well and it confirms the phantom allocation

VM 16211 has a PCI device assigned in host 505, but it should not. Device: GP107 [GeForce GTX 1050 Ti]
VM 16211 has a PCI device assigned in host 505, but it should not. Device: GP107GL High Definition Audio Controller

One major observation is as follows

In a similar working environment, i see the VM template relevant to PCI as follows

PCI = [
ADDRESS = “0000:01:00:0”,
BUS = “01”,
CLASS = “0300”,
DOMAIN = “0000”,
FUNCTION = “0”,
PCI_ID = “0”,
SLOT = “00”,
VENDOR = “10de”,
VM_ADDRESS = “01:01.0”,
VM_BUS = “0x01”,
VM_DOMAIN = “0x0000”,
VM_FUNCTION = “0”,
VM_SLOT = “0x01” ]

In the environment where i have the issue, the PCI section for the affected VM is as follows

PCI=[
CLASS=“0403”,
DEVICE=“10ef”,
PCI_ID=“1”,
VENDOR=“10de”,
VM_ADDRESS=“01:02.0”,
VM_BUS=“0x01”,
VM_DOMAIN=“0x0000”,
VM_FUNCTION=“0”,
VM_SLOT=“0x02” ]

I think the cleanup probable isnt happening because of the missing address keyword ?