PCI Passthrough VM error

Hello,
I was trying to implement PCI Passthrough on a host. The VM problem is:

Thu Jan 31 16:10:07 2019 [Z0][VM][I]: New state is ACTIVE
Thu Jan 31 16:10:07 2019 [Z0][VM][I]: New LCM state is PROLOG
Thu Jan 31 16:10:23 2019 [Z0][VM][I]: New LCM state is BOOT
Thu Jan 31 16:10:27 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/46/deployment.0
Thu Jan 31 16:10:29 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Thu Jan 31 16:10:30 2019 [Z0][VMM][I]: ExitCode: 0
Thu Jan 31 16:10:30 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Thu Jan 31 16:10:32 2019 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy ‘/var/lib/one//datastores/0/46/deployment.0’ ‘opennebula’ 46 opennebula
Thu Jan 31 16:10:32 2019 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/46/deployment.0
Thu Jan 31 16:10:32 2019 [Z0][VMM][I]: error: internal error: process exited while connecting to monitor: 2019-01-31T16:10:32.030824Z qemu-system-x86_64: warning: host doesn’t support requested feature: CPUID.80000001H:ECX.svm [bit 2]
Thu Jan 31 16:10:32 2019 [Z0][VMM][I]: 2019-01-31T16:10:32.060795Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,id=hostdev0,bus=pci.1,addr=0x1: vfio error: 0000:02:00.0: group 43 is not viable
Thu Jan 31 16:10:32 2019 [Z0][VMM][I]: Please ensure all devices within the iommu_group are bound to their vfio bus driver.
Thu Jan 31 16:10:32 2019 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/0/46/deployment.0
Thu Jan 31 16:10:32 2019 [Z0][VMM][I]: ExitCode: 255
Thu Jan 31 16:10:32 2019 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Thu Jan 31 16:10:32 2019 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/0/46/deployment.0
Thu Jan 31 16:10:32 2019 [Z0][VM][I]: New LCM state is BOOT_FAILURE

Has anyone ever had or knows what might be the problem?
Hope that someone can help me here.

This probably means a limitation on your hypervisors, more details here:

Good Morning
Thanks Ruben for the tip. The problem was not assigning all iommu_groups devices to the Guest VM.
In spite of everything has now appeared another problem, in VM we can see the graphics card, we installed the corresponding drivers in our case nvidia-410, but although the card is visible the drivers can not get access to the graphics card in our case is an nvidia geforce gtx 1080 ti.
Has anyone had this problem?

$ lspci -v
01:01.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti ] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. GP102 [GeForce GTX 1080 Ti]
Physical Slot: 1
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at e0000000 (64-bit, prefetchable) [size=32M]
I/O ports at c000 [size=128]
Expansion ROM at fd000000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

01:02.0 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
Subsystem: ASUSTeK Computer Inc. GP102 HDMI Audio Controller
Physical Slot: 2
Flags: bus master, fast devsel, latency 0, IRQ 10
Memory at fd020000 (32-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

$ nvidia-smi
Unable to determine the device handle for GPU 0000:01:01.0: Unknown Error

@plopes, I was able to pass through GPU-card successfully into KVM VM but on libvirt/qemu level.

Steps from my notes are posted below.

Passing through PCI-devices (GPU)

References:


https://www.server-world.info/en/note?os=CentOS_7&p=kvm&f=10
https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF

VT-d and IOMMU

Ensure that AMD-Vi/Intel VT-d is supported by the CPU and enabled in the BIOS settings.
Enable IOMMU:
grubby --update-kernel=ALL --args='intel_iommu=on iommu=pt'

Reboot.

dmesg | grep -e DMAR -e IOMMU

Ensuring that the groups are valid

cat list_iommu_groups.sh
#!/bin/bash
shopt -s nullglob
for d in /sys/kernel/iommu_groups/*/devices/*; do
    n=${d#*/iommu_groups/*}; n=${n%%/*}
    printf 'IOMMU Group %s ' "$n"
    lspci -nns "${d##*/}"
done;

bash list_iommu_groups.sh|grep -i nvidia

VFIO

Extract PCI vendor-device ID pair:

lspci -nn |grep -i nvidia|grep -e "\[.*\]"
(should be something like [10de:15f8])

cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:15f8

Re-generate initramfs image
dracut -f

Reboot and verify that vfio-pci has loaded properly and that it is now bound to the right devices:
dmesg | grep -i vfio

It is not necessary for all devices (or even expected device) from vfio.conf to be in dmesg output. Sometimes a device does not appear in output at boot but actually is able to be visible and operational in guest VM:

lspci -nnk -d 10de:15f8
06:00.0 3D controller [0302]: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:15f8] (rev a1)
        Subsystem: NVIDIA Corporation Device [10de:118f]
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau

Update QEMU

/usr/libexec/qemu-kvm -version
QEMU emulator version 1.5.3 (qemu-kvm-1.5.3-141.el7_4.6), Copyright (c) 2003-2008 Fabrice Bellard

yum install centos-release-qemu-ev

sed -i -e "s/enabled=1/enabled=0/g" /etc/yum.repos.d/CentOS-QEMU-EV.repo

yum --enablerepo=centos-qemu-ev install qemu-kvm-ev

systemctl restart libvirtd

/usr/libexec/qemu-kvm -version
QEMU emulator version 2.10.0(qemu-kvm-ev-2.10.0-21.el7_5.7.1)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

Setting up an OVMF-based guest VM

yum install OVMF

rpm -ql OVMF|grep -i fd
/usr/share/OVMF/OVMF_CODE.secboot.fd
/usr/share/OVMF/OVMF_VARS.fd

Update /etc/libvirt/qemu.conf

diff /etc/libvirt/qemu.conf{,.orig}
731,733d730
< nvram = [
<       "/usr/share/OVMF/OVMF_CODE.secboot.fd:/usr/share/OVMF/OVMF_VARS.fd"
< ]

systemctl restart libvirtd.service

Creating GPU-enabled KVM VM

virt-install \
--name centos7 \
--ram 8192 \
--disk path=/var/kvm/images/centos7.img,size=30 \
--vcpus 4 \
--os-type linux \
--os-variant rhel7 \
--network bridge=br0 \
--graphics none \
--console pty,target_type=serial \
--location 'http://ftp.iij.ad.jp/pub/linux/centos/7/os/x86_64/' \
--extra-args 'console=ttyS0,115200n8 serial' \
--host-device 03:00.0 \
--features kvm_hidden=on \
--machine q35

Not sure if kvm_hidden=on is mandatory but if it is and in case it’s missing in already existing VM one can add that feature:

virsh shutdown <vm-name>

virsh edit <vm-name>

Add to a file between tags

<kvm>
  <hidden state='on'/>
</kvm>

CUDA and NVidia driver installation

Reference:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions

Disable nouveau driver:

grubby --info=ALL
grubby --update-kernel=ALL --args="nouveau.modeset=0 rd.driver.blacklist=nouveau"
dracut -f

or (and?)

vi /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0

Install required packages:

yum install gcc kernel-devel-$(uname -r) kernel-headers-$(uname -r)
yum install epel-release

yum install http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.0.130-1.x86_64.rpm

yum clean all

yum install cuda

No need to reboot.

lsmod|grep -i nvidia

nvidia-smi

Good Morning,
Thanks for the information.
In this momente there is only one problem is that the host (Ubuntu 4.18) has two graphics cards NVIDIA and onboard (aspeed) and regardless of the settings made always detects the two but uses NVIDIA and not onboard, later giving error in the allocation to the VM because it is in use. We have experienced many configuration, there is some particularity that is escaping us?

Hope that someone can help me here.