Enable the NVIDIA vGPU with non SR-IOV support card

Hi,
I am testing support for NVIDIA’s Virtual GPU on OpenNebula 6.4, specifically with an NVIDIA Tesla T4, but I discovered that these cards do not support SR-IOV, or it does not work as expected.

Consulting the PCI device I can see that it supports SR-IOV

lspci -v -s 4b:00.0
4b:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
	Subsystem: NVIDIA Corporation Device 12a2
	Physical Slot: 1
	Flags: bus master, fast devsel, latency 0, IRQ 18, NUMA node 1, IOMMU group 40
	Memory at de000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 23fc0000000 (64-bit, prefetchable) [size=256M]
	Memory at 23ff0000000 (64-bit, prefetchable) [size=32M]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] Null
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [c8] MSI-X: Enable+ Count=6 Masked-
	Capabilities: [100] Virtual Channel
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] Secondary PCI Express
	Capabilities: [bb0] Physical Resizable BAR
	Capabilities: [bcc] Single Root I/O Virtualization (SR-IOV)
	Capabilities: [c14] Alternative Routing-ID Interpretation (ARI)
	Kernel driver in use: nvidia
	Kernel modules: nouveau, nvidia_vgpu_vfio, nvidia

On the other hand, after install the NVidia vGPU driver I can see Host VGPU Mode: Non SR-IOV

nvidia-smi -q 
==============NVSMI LOG==============

Timestamp                                 : Fri Jan 26 14:57:49 2024
Driver Version                            : 535.129.03
CUDA Version                              : Not Found
vGPU Driver Capability
        Heterogenous Multi-vGPU           : Supported

Attached GPUs                             : 1
GPU 00000000:4B:00.0
    Product Name                          : Tesla T4
    Product Brand                         : NVIDIA
    Product Architecture                  : Turing
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    Addressing Mode                       : N/A
    vGPU Device Capability
        Fractional Multi-vGPU             : Supported
        Heterogeneous Time-Slice Profiles : Supported
        Heterogeneous Time-Slice Sizes    : Not Supported
....
    GPU Virtualization Mode
        Virtualization Mode               : Host VGPU
        Host VGPU Mode                    : Non SR-IOV

The documentation say if your GPU support SR-IOV use sriov-manage command to enable virtual functions and if not check the NVIDIA documentation.
For non SR-IOV I follow this guide to create virtual GPU.

I can confirm that the /sys/bus/mdev/devices/ directory contains the mdev device file for the vGPU
and I can list them.

21bceb0c-c284-4db5-b8f9-807608e21fe5 -> ../../../devices/pci0000:4a/0000:4a:02.0/0000:4b:00.0/21bceb0c-c284-4db5-b8f9-807608e21fe5
87dd2ff0-2624-42f2-ba87-ba7a38bfce78 -> ../../../devices/pci0000:4a/0000:4a:02.0/0000:4b:00.0/87dd2ff0-2624-42f2-ba87-ba7a38bfce78
25258f79-52e3-443c-a0cc-014aca508c1e -> ../../../devices/pci0000:4a/0000:4a:02.0/0000:4b:00.0/904f8f79-52e3-443c-a0cc-014aca508c1e
c6f8d48d-7282-5d65-bd0c-dbe46602b734 -> ../../../devices/pci0000:4a/0000:4a:02.0/0000:4b:00.0/c6f8d48d-7282-5d65-bd0c-dbe46602b734
d000b06f-3a7a-4b29-baea-abda99443030 -> ../../../devices/pci0000:4a/0000:4a:02.0/0000:4b:00.0/d000b06f-3a7a-4b29-baea-abda99443030
fd67e9b8-d360-4792-80fe-a4fa62e56eea -> ../../../devices/pci0000:4a/0000:4a:02.0/0000:4b:00.0/fd67e9b8-d360-4792-80fe-a4fa62e56eea

If I list the active mediated devices on the hypervisor host

mdevctl list
21bceb0c-c284-4db5-b8f9-807608e21fe5 0000:4b:00.0 nvidia-222 auto (defined)
87dd2ff0-2624-42f2-ba87-ba7a38bfce78 0000:4b:00.0 nvidia-230 auto (defined)
904f8f79-52e3-443c-a0cc-014aca508c1e 0000:4b:00.0 nvidia-222 auto (defined)
c6f8d48d-7282-5d65-bd0c-dbe46602b734 0000:4b:00.0 nvidia-222 manual
d000b06f-3a7a-4b29-baea-abda99443030 0000:4b:00.0 nvidia-222 auto (defined)
fd67e9b8-d360-4792-80fe-a4fa62e56eea 0000:4b:00.0 nvidia-222 auto (defined)

However, when I show node information I only see one pci device and It is only possible to create one virtual machine.

onehost show node18111-1
PCI DEVICES
   VM ADDR    TYPE           NAME                                              
      01:00.0 19a2:0120:0604 x1 PCIe Gen2 Bridge[Pilot4]
      31:00.0 15b3:101b:0207 MT28908 Family [ConnectX-6]
      4b:00.0 10de:1eb8:0302 NVIDIA Corporation TU104GL [Tesla T4]

The question are:
Is it possible to use non SR-iov support cards in OpenNebula >= 6.4 ?
If it is possible, how can use more than one vGPU or how can I add the vGPU to OpenNebula? I tried to add manually with virsh but It does not work.

Thanks in advance.

It should be possible to use both SR-IOV an non SR-IOV cards.

Do the vGPU’s show up in lspci -mmnn -d '10de:*' on the host? The PCI devices are discovered through this, generally using the filter from your /var/lib/one/remotes/etc/im/kvm-probes.d/pci.conf but I’ve used 10de here since that should cover all nvidia devices. This should also be the value under the nvidia_vendors in that configuration file as well.

Feel free to read into the script found at remotes/im/kvm-probes.d/host/system/pci.rb to see what I mean. The remotes folder is in /var/lib/one on the frontend and /var/tmp/one on the host(s).

You can also just run this script on the host under /var/tmp/one/im/kvm-probes.d/host/system/pci.rb to see the output from it. This should match the information in the onehost show output.

Hi,

I have the same issue, running mdevctl list shows all the defined VGPU instances:

mdevctl list
3faeb9cc-93a6-40d8-838b-551b44d17a87 0000:41:00.0 nvidia-64 (defined)
af591aad-318c-48cc-aa0f-34f8a019f2ca 0000:41:00.0 nvidia-64 (defined)
69ebf49b-c397-4dfe-9069-2bedc6ae0923 0000:41:00.0 nvidia-64 (defined)
5c748b2a-8148-40c9-b712-c400ae2b56fb 0000:41:00.0 nvidia-64 (defined)

While running lspci -mmnn -d ‘10de:*’, shows only the physical cards on the system,

lspci -mmnn -d '10de:*'
41:00.0 "3D controller [0302]" "NVIDIA Corporation [10de]" "GP104GL [Tesla P4] [1bb3]" -ra1 "NVIDIA Corporation [10de]" "GP104GL [Tesla P4] [11d8]"
42:00.0 "3D controller [0302]" "NVIDIA Corporation [10de]" "GP104GL [Tesla P4] [1bb3]" -ra1 "NVIDIA Corporation [10de]" "GP104GL [Tesla P4] [11d8]"

Running /var/tmp/one/im/kvm-probes.d/host/system/pci.rb has the same output as above.
This means that the defined VGPU instances are not available to virtual machines.

Hi,

I can see the GPU in the host but not vGPUs

lspci -mmnn -d '10de:*'
4b:00.0 "3D controller [0302]" "NVIDIA Corporation [10de]" "TU104GL [Tesla T4] [1eb8]" -ra1 "NVIDIA Corporation [10de]" "Device [12a2]"

My filters are:

:device_name:
  - 'MT28908'
  - 'A100'
  - 'TU104GL'

# List of NVIDIA vendor IDs, these are used to recognize PCI devices from
# NVIDIA and use vGPU feature
:nvidia_vendors:
  - '10de'

Running the script on the host:

/var/tmp/one/im/kvm-probes.d/host/system/pci.rb
PCI = [
 TYPE = "19a2:0120:0604" ,
 VENDOR = "19a2" ,
 VENDOR_NAME = "Emulex Corporation" ,
 DEVICE = "0120" ,
 CLASS = "0604" ,
 CLASS_NAME = "PCI bridge" ,
 ADDRESS = "0000:01:00:0" ,
 SHORT_ADDRESS = "01:00.0" ,
 DOMAIN = "0000" ,
 BUS = "01" ,
 SLOT = "00" ,
 FUNCTION = "0" ,
 NUMA_NODE = "0" ,
 DEVICE_NAME = "x1 PCIe Gen2 Bridge[Pilot4]" 
]
PCI = [
 TYPE = "15b3:101b:0207" ,
 VENDOR = "15b3" ,
 VENDOR_NAME = "Mellanox Technologies" ,
 DEVICE = "101b" ,
 CLASS = "0207" ,
 CLASS_NAME = "Infiniband controller" ,
 ADDRESS = "0000:31:00:0" ,
 SHORT_ADDRESS = "31:00.0" ,
 DOMAIN = "0000" ,
 BUS = "31" ,
 SLOT = "00" ,
 FUNCTION = "0" ,
 NUMA_NODE = "0" ,
 DEVICE_NAME = "MT28908 Family [ConnectX-6]" 
]
PCI = [
 TYPE = "10de:1eb8:0302" ,
 VENDOR = "10de" ,
 VENDOR_NAME = "NVIDIA Corporation" ,
 DEVICE = "1eb8" ,
 CLASS = "0302" ,
 CLASS_NAME = "3D controller" ,
 ADDRESS = "0000:4b:00:0" ,
 SHORT_ADDRESS = "4b:00.0" ,
 DOMAIN = "0000" ,
 BUS = "4b" ,
 SLOT = "00" ,
 FUNCTION = "0" ,
 NUMA_NODE = "1" ,
 UUID = "c6f8d48d-7282-5d65-bd0c-dbe46602b734" ,
 DEVICE_NAME = "NVIDIA Corporation TU104GL [Tesla T4]" 
]

the defined VGPU instances

mdevctl list
21bceb0c-c284-4db5-b8f9-807608e21fe5 0000:4b:00.0 nvidia-222 auto (defined)
87dd2ff0-2624-42f2-ba87-ba7a38bfce78 0000:4b:00.0 nvidia-230 auto (defined)
904f8f79-52e3-443c-a0cc-014aca508c1e 0000:4b:00.0 nvidia-222 auto (defined)
d000b06f-3a7a-4b29-baea-abda99443030 0000:4b:00.0 nvidia-222 auto (defined)
fd67e9b8-d360-4792-80fe-a4fa62e56eea 0000:4b:00.0 nvidia-222 auto (defined)

And onehost show

onehost show node18111-1
...
NAME="node18111-1"
RESERVED_CPU=""
RESERVED_MEM=""
VERSION="6.4.0.1"
VM_MAD="kvm"

PCI DEVICES

   VM ADDR    TYPE           NAME                                              
      01:00.0 19a2:0120:0604 x1 PCIe Gen2 Bridge[Pilot4]
      31:00.0 15b3:101b:0207 MT28908 Family [ConnectX-6]
      4b:00.0 10de:1eb8:0302 NVIDIA Corporation TU104GL [Tesla T4]
onehost show node18111-1
VERSION="6.4.0.1"
VM_MAD="kvm"

PCI DEVICES

   VM ADDR    TYPE           NAME                                              
      01:00.0 19a2:0120:0604 x1 PCIe Gen2 Bridge[Pilot4]
      31:00.0 15b3:101b:0207 MT28908 Family [ConnectX-6]
      4b:00.0 10de:1eb8:0302 NVIDIA Corporation TU104GL [Tesla T4]

I can only instantiate one virtual machine because de VGPU instances are no available to virtual machine.

However, a card with SR-IOV support I can see several pci devices:

VERSION="6.4.0.1"
VM_MAD="kvm"

PCI DEVICES

   VM ADDR    TYPE           NAME                                              
      17:00.4 10de:20f1:0302 NVIDIA Corporation GA100 [A100 PCIe 40GB]
      17:00.5 10de:20f1:0302 NVIDIA Corporation GA100 [A100 PCIe 40GB]
      17:00.6 10de:20f1:0302 NVIDIA Corporation GA100 [A100 PCIe 40GB]
      17:00.7 10de:20f1:0302 NVIDIA Corporation GA100 [A100 PCIe 40GB]
...
/var/tmp/one/im/kvm-probes.d/host/system/pci.rb
PCI = [
 TYPE = "10de:20f1:0302" ,
 VENDOR = "10de" ,
 VENDOR_NAME = "NVIDIA Corporation" ,
 DEVICE = "20f1" ,
 CLASS = "0302" ,
 CLASS_NAME = "3D controller" ,
 ADDRESS = "0000:17:00:4" ,
 SHORT_ADDRESS = "17:00.4" ,
 DOMAIN = "0000" ,
 BUS = "17" ,
 SLOT = "00" ,
 FUNCTION = "4" ,
 NUMA_NODE = "0" ,
 UUID = "b8a460fd-2e3c-500f-bbbe-3436f451888f" ,
 DEVICE_NAME = "NVIDIA Corporation GA100 [A100 PCIe 40GB]" 
]
PCI = [
 TYPE = "10de:20f1:0302" ,
 VENDOR = "10de" ,
 VENDOR_NAME = "NVIDIA Corporation" ,
 DEVICE = "20f1" ,
 CLASS = "0302" ,
 CLASS_NAME = "3D controller" ,
 ADDRESS = "0000:17:00:5" ,
 SHORT_ADDRESS = "17:00.5" ,
 DOMAIN = "0000" ,
 BUS = "17" ,
 SLOT = "00" ,
 FUNCTION = "5" ,
 NUMA_NODE = "0" ,
 UUID = "a92340c3-7e7f-51d8-b31e-5ec3011cfbfc" ,
 DEVICE_NAME = "NVIDIA Corporation GA100 [A100 PCIe 40GB]" 
]
PCI = [
 TYPE = "10de:20f1:0302" ,
 VENDOR = "10de" ,
 VENDOR_NAME = "NVIDIA Corporation" ,
 DEVICE = "20f1" ,
 CLASS = "0302" ,
 CLASS_NAME = "3D controller" ,
 ADDRESS = "0000:17:00:6" ,
 SHORT_ADDRESS = "17:00.6" ,
 DOMAIN = "0000" ,
 BUS = "17" ,
 SLOT = "00" ,
 FUNCTION = "6" ,
 NUMA_NODE = "0" ,
 UUID = "616512b7-6632-55f2-8aee-f7e5c5ca07c8" ,
 DEVICE_NAME = "NVIDIA Corporation GA100 [A100 PCIe 40GB]" 
]
PCI = [
 TYPE = "10de:20f1:0302" ,
 VENDOR = "10de" ,
 VENDOR_NAME = "NVIDIA Corporation" ,
 DEVICE = "20f1" ,
 CLASS = "0302" ,
 CLASS_NAME = "3D controller" ,
 ADDRESS = "0000:17:00:7" ,
....

Thanks for your help

Looking through the pci.rb script, there are a couple places I guess this could be happening.

The :filter: setting will default to 0:0 if it’s not defined so if you don’t have that defined then try setting it to *:* to grab all PCI devices to see. If you do have it defined, it may be worth it to try that filter out anyways, just so it’s not filtering. The devices should be listed in lspci though, or the pci.rb won’t detect them. It seems you’ve created them according to the nvidia documentation about Legacy VGPU creation, so it should be OK. I unfortunately don’t a GPU handy which has vGPU capabilities but without SR-IOV to test this further.

While I don’t think it’s the issue, you could try removing the :device_name: values, so that it’s not filtering out by the names at all. I assume you also don’t have the :short_address: defined which is another place it could filter.

Another thing to check, although it doesn’t appear to be making it this far, is the virtfn existing for the device. If you look in /sys/bus/pci/devices/<addr>/, if there is a virtfn entry, then it will skip that device. If that’s the case, then you can try modifying the pci probe and comment out these two lines, if that does fix it then we’ll have to look into this issue a bit further.

Also, thank you for your patience while I could make time to do some research for this.