Cannot boot VM with GPU passthrough and >1TB RAM

Please, describe the problem here and provide additional information below (if applicable) …
I am unable to boot an ubuntu 20.04 VM with two RTX3090s passed through and more than 1TB of RAM assigned to the VM. The Hypervisor has 2TB of RAM installed and available. Amounts less than 1TB allow the VM to boot without issue, but any amount over 1TB result in the following error:
Driver Error
Fri Mar 24 13:49:55 2023: DEPLOY: error: Failed to create domain from /var/lib/one//datastores/0/629/deployment.37 error: internal error: process exited while connecting to monitor: 2023-03-24T20:49:54.541929Z qemu-kvm-one: -device vfio-pci,host=0000:01:00.0,id=hostdev0,bus=pci.1,addr=0x1: VFIO_MAP_DMA: -22 2023-03-24T20:49:54.552269Z qemu-kvm-one: -device vfio-pci,host=0000:01:00.0,id=hostdev0,bus=pci.1,addr=0x1: vfio 0000:01:00.0: failed to setup container for group 0: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x562067afcc70, 0x100000000, 0x17640000000, 0x7e322be00000) = -22 (Invalid argument) Could not create domain from /var/lib/one//datastores/0/629/deployment.37 ExitCode: 255


Versions of the related components and OS (frontend, hypervisors, VMs):
Ubuntu 20.04.5
OpenNebula 6.2.0.1
QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.24)
Steps to reproduce:
Create a VM on a host with dual GPU passthrough setup and assign the VM more than 1TB of RAM
Current results:
Driver Error
Fri Mar 24 13:49:55 2023: DEPLOY: error: Failed to create domain from /var/lib/one//datastores/0/629/deployment.37 error: internal error: process exited while connecting to monitor: 2023-03-24T20:49:54.541929Z qemu-kvm-one: -device vfio-pci,host=0000:01:00.0,id=hostdev0,bus=pci.1,addr=0x1: VFIO_MAP_DMA: -22 2023-03-24T20:49:54.552269Z qemu-kvm-one: -device vfio-pci,host=0000:01:00.0,id=hostdev0,bus=pci.1,addr=0x1: vfio 0000:01:00.0: failed to setup container for group 0: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x562067afcc70, 0x100000000, 0x17640000000, 0x7e322be00000) = -22 (Invalid argument) Could not create domain from /var/lib/one//datastores/0/629/deployment.37 ExitCode: 255
Expected results:
VM boots with more than 1TB of RAM assigned.

It looks like this is an issue with QEMU / KVM and having multiple pci devices in an IOMMU group.

Not sure if it helps after so long, but we have the same issue on our OpenStack cluster but only on AMD-based hosts. It is almost certain to be related to this discussion: Fix creation of >= 1Tb guests on AMD systems with IOMMU

Based on the RedHat bug report it should hopefully be fixed in a newer version of qemu-kvm.

Thanks for the reply, I actually fixed this with the HPB machine type available in opennebula.