GPU Passthrough was working fine for me, but now it is broken, it looks like one is expecting there to be NVIDIA drivers on the host and VGPU enabled etc…
Is there a way to disable this new functionality ? Previously GPU passthrough didn’t work if there were NVIDIA drivers on the host because the GPUs couldn’t unbind from the host, has this now changed ?
Directory '/sys/class/mdev_bus/0000:01:00.0' does not exist Directory '/sys/class/mdev_bus/0000:01:00.1' does not exist error
@Sean_JC Can you post a link to the PR or explain it here? We’re stuck with the same problem and can not pass through GPUs to our VMs.
Full problem:
Fri Sep 30 17:45:23 2022: DEPLOY: Directory '/sys/class/mdev_bus/0000:04:00.0' does not exist error: Failed to create domain from /var/lib/one/datastores/110/1814/deployment.0 error: device not found: mediated device 'f7cdd2bc-e0bc-51f5-bdf3-62261edc310c' not found Could not create domain from /var/lib/one/datastores/110/1814/deployment.0 ExitCode: 255
For anybody still wondering, we solved this problem in the issue #5968.
The general problem is that OpenNebula 6.4 by default assumes that every GPU device is a vGPU. ON does this by adding a UUID which is visible with the onehost show <id> command to the PCI device.
If this happens, ON wants to find a mediated device on the VM start, which of course does not exist.
To solve this error, you need to do a soft redeploy of the node by forcing ON to forget about the PCI device on that particular host. The details on how to execute that are explained in the issue.
The general problem is that OpenNebula 6.4 by default assumes that every GPU device is a vGPU.
Indeed, that’s the source of the problem. I pushed a patch that should solve the issue.The handling of GPU and vGPUs devices should now work correctly, with both behaviors not colliding with each other.
I’ve included in the issue all the details regarding this problem.