Modifying vGPU params before VM starts

Hello,

My system is sharing an NVIDIA L40S and, now, after installing GRID driver, I have got vGPUs for that GPU. However, all types I get are A, B or Q, but I need to get C models (for CUDA purposes). I haver read at NVIDIA documentation that I need to run an “echo” to a file (vgpu_params) that is located is the $mdev/UUID/nvidia folder. However, that folder is created AFTER VM starts but, then, if I try to run “echo”, system retuns an “Operation Not Permitted” (I suppose file is locked by operating system or by the GPU device).

How could run that “echo” to enable “debugging” and “profiling” for CUDA purposes? In others words, where is the last moment BEFORE VM starts to add manually the “echo” inside the script?

Thanks.

Hi,
I have no firsthand experience; these are my generic thoughts, so take them with a grain of salt.

The last script called is vmm/kvm/deploy it is located in /var/lib/one/remotes/ on the frontend, but on the KVM host, where it is executed, it is a copy /var/tmp/one/.

Options without altering the upstream files are as follows:

  • OpenNebula hooks - attach OpenNebula hook (script) when the VM is in state Active/Running. The hook could be executed on the KVM host, and the VM metadata/template XML provided for context
  • Libvirt hooks - on the kvm host, a script in /etc/libvirt/hooks/qemu.d/hookscript, the script is called libvirtd at different stages of the VM deployment. You should check the passed arguments to pick the right stage. Note that the context is a bit limited because the process is not managed by OpenNebula

Regarding the ‘operation not permitted’ error, keep in mind that the scripts will most likely be run by the oneadmin user, so some sudo magic will be needed to escalate privileges.
Also, please note that for better process isolation, the libvirt daemon is starting the domain in a separate namespace, so there is a possibility that you should exec commands in the namespace of the qemu-kvm pid nsenter -m -t $PID command...

Hope this helps.

1 Like

Hello @Daniel_Ruiz_Molina,

This looks more like a limitation of NVIDIA vGPU rather than something specific to OpenNebula. According to NVIDIA’s documentation, this configuration is not supported:
Nvidia Docs

So it seems that modifying these vGPU parameters before VM startup is not possible due to NVIDIA driver restrictions.

Cheers,

Hi,

After searching inside the NVIDIA web, yesterday I found a link to request a 90-free day Enterprise license. Now, with that new driver, systems detects “C” vGPUs in the server (that is running OpenNebula) and, also, in the VM (with Rocky).

Your link is VERY userful, but I have seen it today. However, I will save your answer because with it I will always remember that “GRID” driver it’s not enough, because I need “AI Enterprise” driver.

Thanks for your help!!!