vGPU experience and profiles

Hello everyone!

Has anyone had experience working with vGPU in OpenNebula?
This is an entirely new experience for me, and so far, I’ve only learned how to add a GPU to a VM created in Nebula using passthrough.

https://docs.opennebula.io/6.10/open_cluster_deployment/kvm_node/pci_passthrough.html

Everything works fine, but even when I follow the instructions here: NVIDIA vGPU support — OpenNebula 6.10.1 documentation, nothing new appears. Only a single GPU is displayed everywhere. I don’t understand what to do or where to get the profiles, how to add and display them in OpenNebula.

For testing, I have an NVIDIA L4 graphics card and have installed the NVIDIA Host Drivers.

Could anyone share their experience on how to create, display, and use vGPU profiles in OpenNebula?

I managed to figure out vGPU setup and profile creation. Here’s the information needed for a successful vGPU launch and profile selection.

First, you need to refer to the NVIDIA vGPU support guide by ON: NVIDIA vGPU support.

For Enabling Virtual Functions, I recommend creating a systemd service.

Create the service:

nano /etc/systemd/system/nvidia-sriov.service

Add the following:

[Unit]
Description=Enable NVIDIA SR-IOV
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/lib/nvidia/sriov-manage -e 00:41:0000.0
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

NOTE: 00:41:0000.0 may differ in your case. Check the ON guide carefully.

Save and exit /etc/systemd/system/nvidia-sriov.service, then run:

systemctl daemon-reload
systemctl enable nvidia-sriov
reboot

This ensures that SR-IOV and mdev remain active after a server reboot.

When creating a VM in ON or adding a PCI device, the UUID of the mdev device will be required. These need to be created in advance.

You can list all UUIDs using the following command:

/var/tmp/one/im/kvm-probes.d/host/system/pci.rb

Example output:

PCI = [
 TYPE = "10de:27b8:0302" ,
 VENDOR = "10de" ,
 VENDOR_NAME = "NVIDIA Corporation" ,
 DEVICE = "27b8" ,
 CLASS = "0302" ,
 CLASS_NAME = "3D controller" ,
 ADDRESS = "0000:01:04:3" ,
 SHORT_ADDRESS = "01:04.3" ,
 DOMAIN = "0000" ,
 BUS = "01" ,
 SLOT = "04" ,
 FUNCTION = "3" ,
 NUMA_NODE = "0" ,
 DEVICE_NAME = "NVIDIA Corporation AD104GL [L4]" ,
 UUID = "811ba00f-b95c-5d0d-a870-99b48162add9" ,
 PROFILES = "nvidia-908,nvidia-909,nvidia-910,nvidia-911,nvidia-912,nvidia-913,nvidia-914,nvidia-915,nvidia-916,nvidia-917,nvidia-918,nvidia-919,nvidia-920,nvidia-921,nvidia-922,nvidia-923,nvidia-924,nvidia-925"
]

Using this information, you need to create an mdev and make it auto-enabled so it persists after reboot:

echo 811ba00f-b95c-5d0d-a870-99b48162add9 > /sys/class/mdev_bus/0000:01:04:3/mdev_supported_types/nvidia-919/create

Then, you can create a VM and assign the vGPU to it using Sunstone with SHORT_ADDRESS = "01:04.3".

The VM should boot successfully with the chosen profile.

nvidia-919 is the selected profile.
You can check the profile details like this:

cat /sys/class/mdev_bus/0000:01:04.3/mdev_supported_types/nvidia-919/description

Example:

num_heads=1, frl_config=60, framebuffer=2048M, max_resolution=1280x1024, max_instance=12

The only issue I haven’t solved yet is whether it’s possible to have multiple different profiles on a single GPU.
In my case, after creating one 2048M profile, I can create up to 11 more profiles of the same type, but other profiles get locked after creating the first one.

1 Like

I also want to add to the post that you can view the list of mdev devices using the command:

mdevctl list

Example output:

811ba00f-b95c-5d0d-a870-99b48162add9 0000:01:00.6 nvidia-919 manual

To avoid manual recreating the mdev device after a reboot, execute the following commands:

mdevctl define --uuid 811ba00f-b95c-5d0d-a870-99b48162add9
mdevctl set-auto --uuid 811ba00f-b95c-5d0d-a870-99b48162add9
1 Like

Hi, @Mayhem.

It is possible to have different vGPU profiles in a single GPU by enabling mixed-size mode for the GPU, according to NVIDIA’s documentation for vGPU v17. This feature does not seem to be available in v16, according to docs, so depending on the version you’re using you should be able to use several profiles with some limitations.

Please, take a look the NVIDIA documentation on how to configure mixed-size mode in KVM and the valid configurations.

1 Like

Thank you so much! I’ll try it soon as well.
Could you please tell me if it is possible to deploy a VM from a template and how to specify a PCI device in the template so that mdev is created by OpenNebula oneadmin? I have to first manually create an mdev device on the host node and only then can I create a VM and not get an error with “mediated device”

Yes, you can edit the VM template to deploy a VM using the vGPU. You can add the PCI device to the template following the Usage section of the PCI Passthrough documentation. For this, you have to add the PCI section to the VM template (onetemplate update will open a text editor), after checking the parameters with onehost show. Add the PCI section at the end of your VM template as shown in the docs:

PCI = [
VENDOR = “8086”,
DEVICE = “0a0c”,
CLASS = “0403” ]

1 Like

Thank you so much!

1 Like

@Mayhem i am also trying to test this, i have NVidia driver working on the KVM host, however when i try to create a VM and add a PCI device nothing is displayed in the drop down. I also followed your steps above one thing that didn’t return an output was listing all the UUIDs, it returned with nothing displayed.

Hello, @a2k

When VM is created and Guest Driver installed on the Host Node you should check that nvidia-smi works on VM.

When VM with driver running you may see it with nvidia-smi vgpu on the host node

Have you already added all mdevs? Guest Drivers been installed?

The fact that nothing is displayed in the drop down menu is a bug that will be fixed in next releases of OpenNebula.

1 Like

no when i ran this command it returned no output /var/tmp/one/im/kvm-probes.d/host/system/pci.rb