IOMMU VGA Passthrough on Ubuntu 16.04, or My Ticket to Paradise

IOMMU GPU Passthrough setup instructions based on Ubuntu 16.04.

Attention: The content of this page has been translated via Google translate. It is currently just a placeholder until I find some time to translate it properly. My English isn’t that bad. 😉


I’ve switched! After years of hesitation and some half-hearted attempts, at least at my private desktop PC, I have turned my back on Windows. I’m sure this step is not for everyone. And even if in the first days search queries like “ubuntu black screen” were quiet uncommon, my system now runs actually very quite reliable and I start to appreciate the advantages of a free system. And well, often the way is the goal. 🙂


For me, the only remaining problem with Linux is the missing support of recent gaming titles.

Relatively simple remedy creates here a dual boot setup. That means on a PC two systems are installed simultaneously – for example, to work with Linux and to play Windows.
While the PC is up, the user must decide which of the two systems is to be started. If you want to know more about this topic, read on here.

The disadvantages of this setup are easy to see:

  • Changing the system requires a reboot.
  • Access to the same files is restricted.
  • As soon as a system is running, the second is always switched off.

I had set up this setup for a year before I became aware of the possibility of virtualization, and I have to say – just the reboot forced between the change is worse than it sounds.

Virtual machines (VM) offer a further possibility of parallel use of operating systems. With the aid of VMs, computer systems can be simulated using software. Thanks to virtualization support in processors, it is possible to allocate processor and memory resources to the simulated system from the host system. The host can start multiple VMs at the same time, simulating several computers at the same time. This works as long as all the assured hardware resources of the VMs, which do not exceed the hardware resources of the host system.

However, since any hardware access to a VM is managed via the host system, you must expect a loss in the performance. This applies in particular to accesses to PCIe resources such as the graphics card.

This problem can be solved by means of the “Virtual Function I / O” (VFIO) framework for virtual machines.
It allows a direct and exclusive assignment, from PCI devices to a VM, without the need to go through the host. As a result, a virtually native performance is achieved despite the virtualized system (about 3% less than with normal use of the hardware).

To pass PCIe resources to VM, the hardware used must support IOMMU and Access Control Services (ACS). Where IOMMU is necessary to allow external access (outside the host) to the PCI resources, and ACS is used to directly address individual PCI devices.

This means that both the processor and the motherboard, as well as the transmitted graphics card, must support IOMMU. The processor additionally ACS.

This is best from the specifications of the hardware, via lists such as. here
Or ask directly from the manufacturer intel amd

The names for the virtualization support can differ depending on the manufacturer.
For AMD:

  • AMD-V
  • Pacifica
  • AMD VI

At Intel:

  • VT-d
  • Vanderpool
  • Intel VT-x

The first rapid test is:

egrep -c '(vmx|svm)' /proc/cpuinfo

If 0 is displayed, the processor is not suitable for virtualization. For values ​​greater than or equal to 1, we can continue.


Anyone who follows this guide to the end will have two systems that can run at the same time, and the virtual system has direct access to PCIe hardware.

The whole setup will work as follows:

  • During the system startup of Linux, defined hardware resources are isolated or ignored by the host system (English host).
  • There is a virtual machine, usually with a Windows installation.
  • The guest is configured with the hardware ignored by the host (PCI passphrase). In addition, processor cores and RAM of the host are passed to the guest.
  • As soon as the guest starts, it uses this hardware exclusively. This means that the host will be able to access even less RAM and processor cores.

Note that the PCI resources that are secured to the guest are not available to the host, even while the guest is not running. This is in contrast to processor and RAM resources, which can be distributed dynamically between host and guest thanks to the virtualization feature. This also means to transfer a graphics card to the guest, the host must have at least two graphics cards, since otherwise he does not have any more available.
Two graphics cards also means that you must display two signals at the same time. So you either use two monitors, or use a second input on a monitor. In principle it is as if two computers were installed in a housing.
Important: As long as the hardware does not support the procedure, this guide can not be successful.
In addition, it applies to Intel processors, not all processors that support vt-d are also suitable for the release of PCIe resources. The necessary ACS support is often not available for the regular consumer CPUs.


Let’s get jiggy with it

Hardware Setup

  • CPU: Intel Xeon E3-1230 v2
  • Mainboard: Gigabyte GA-Z77-DS3H (Bios F11a)
  • RAM: DDR-3 16 GB (4 × 4Gb)
  • Grafikkarte Host: Geforce GT 730 (Gainward)
  • Grafikkarte Guest: Geforce GTX 970 (Gainward)
  • Optional* 2 Monitore.
  • Optional* USB-KVM-Switch zum umschalten von Maus und Tastatur zwischen Host und Guest.

Sketch of wiring

still todo.


Software Setup

  • Host OS: Ubuntu 16.04 x64 (Kernel 4.1 or later)
  • Guest OS: Windows 10 x64

At this point I would like to point out that I have tried two days in vain my luck that system on Ubuntu 14.04 to run. Much easier under Ubuntu 16.04, which brings a kernel with improved virtualization support (Kernel version> = 4.1).

KVM-Installation

sudo apt-get install qemu-kvm libvirt-bin bridge-utils virt-manager

This will install the KVM environment, Bridged Networking, and the Virtual Machine Manager.
After installation, the user name must be added to the libvirtd user group.

sudo adduser *name libvirtd

*name should be customized.
restart after customizing the user groups.


Step 1 – Activate IOMMU & virtualization support in BIOS.

  • VT-d: enabled
  • Virtualization: enabled
  • PEG (PCIE): UEFI*
  • SATA: AHCI – UEFI*

* I’m not sure if the UEFI settings are actually necessary.


Step 2 – Activate IOMMU in Ubuntu.

Add the following entries in /etc/modules:

Intel

#added for vfio/kvm support
vfio
vfio_iommu_type1
vfio_pci
kvm
kvm_intel

AMD

#added for vfio/kvm support
vfio
vfio_iommu_type1
vfio_pci
kvm
kvm_amd

In addition, IOMMU support must be activated in /etc/default/grub.

Intel

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on"

AMD

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on"

After each change the GRUB config needs to be updated via:
sudo update-grub

After a restart check via:
dmesg|grep -e DMAR -e IOMMU
if IOMMU is enabled.
The resulting output should look like this:

[    0.000000] ACPI: DMAR 0x00000000CE0A9638 000080 (v01 INTEL  SNB      00000001 INTL 00000001)
[    0.000000] DMAR: IOMMU enabled
...
...
...
[   37.984518] vboxpci: IOMMU found
[ 3247.405413] DMAR: Setting identity map for device 0000:00:14.0 [0xce0aa000 - 0xce0b6fff]

So far, so good!


Step 3 – Identifying the GPU and isolating it.

This step identifies the hardware resources which are ignored by the host, and later forwarded to the guest.

lspci -nn

Displays a list of all devices available in the system.
Each line represents a device and begins with its id in the system, followed by a generic description, the name of the device, and the hardware ID of the device.
Both Sysmten-Id (i.e. XX: XX.X) and Hardware-Id ([XXXX: XXXX]) must be recorded.
The output should look something like this:

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v2/Ivy Bridge DRAM Controller [8086:0158] (rev 09)
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port [8086:0151] (rev 09)
00:14.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller [8086:1e31] (rev 04)
00:16.0 Communication controller [0780]: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 [8086:1e3a] (rev 04)
00:1a.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 [8086:1e2d] (rev 04)
00:1b.0 Audio device [0403]: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller [8086:1e20] (rev 04)
00:1c.0 PCI bridge [0604]: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 [8086:1e10] (rev c4)
00:1c.2 PCI bridge [0604]: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 3 [8086:1e14] (rev c4)
00:1c.3 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev c4)
00:1c.4 PCI bridge [0604]: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 5 [8086:1e18] (rev c4)
00:1d.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 [8086:1e26] (rev 04)
00:1f.0 ISA bridge [0601]: Intel Corporation Z77 Express Chipset LPC Controller [8086:1e44] (rev 04)
00:1f.2 SATA controller [0106]: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] [8086:1e02] (rev 04)
00:1f.3 SMBus [0c05]: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller [8086:1e22] (rev 04)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208 [GeForce GT 730] [10de:1287] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1)
03:00.0 Ethernet controller [0200]: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet [1969:1083] (rev c0)
04:00.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev 41)
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 970] [10de:13c2] (rev a1)
06:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)

The two highlighted Ids are the Ids, which are to be isolated first and later handed over.

Only complete IOMMU groups can be transferred to the guest.

Thus we use

find /sys/kernel/iommu_groups/ -type l
to check if the identified devices are in one group.

/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:14.0
/sys/kernel/iommu_groups/3/devices/0000:00:16.0
/sys/kernel/iommu_groups/4/devices/0000:00:1a.0
/sys/kernel/iommu_groups/5/devices/0000:00:1b.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.2
/sys/kernel/iommu_groups/8/devices/0000:00:1c.3
/sys/kernel/iommu_groups/8/devices/0000:04:00.0
/sys/kernel/iommu_groups/9/devices/0000:00:1c.4
/sys/kernel/iommu_groups/10/devices/0000:00:1d.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.2
/sys/kernel/iommu_groups/11/devices/0000:00:1f.3
/sys/kernel/iommu_groups/12/devices/0000:03:00.0
/sys/kernel/iommu_groups/13/devices/0000:06:00.0
/sys/kernel/iommu_groups/13/devices/0000:06:00.1

If this is not the case, a different PCIe slot can be used – or all devices belonging to the group must be transferred.

CAUTION: After the next step, the hardware will be ignored by the host. A second graphics card must be present and configured in the system before continuing – otherwise, there will be no display after the next restart.

In /etc/initramfs-tools/modules the following line must be inserted:
vfio_pci ids=10de:13c2,10de:0fbb

Of course, the hardware IDs must be adapted to the hardware used.
This means that the devices use the vfio_pci driver immediately before they are started, before they can be assigned to their actual driver.

Now update initram to apply the changes.

update-initramfs -u

Now reboot – afterwards check via:
lspci -nnk
if the correct driver  allocation has happend (vfio-pci).

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v2/Ivy Bridge DRAM Controller [8086:0158] (rev 09)
	Subsystem: Gigabyte Technology Co., Ltd Xeon E3-1200 v2/Ivy Bridge DRAM Controller [1458:5000]
	Kernel modules: ie31200_edac
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port [8086:0151] (rev 09)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
[...]
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208 [GeForce GT 730] [10de:1287] (rev a1)
	Subsystem: NVIDIA Corporation GK208 [GeForce GT 730] [10de:1287]
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_340
01:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1)
	Subsystem: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:1287]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
...
04:00.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev 41)
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 970] [10de:13c2] (rev a1)
	Subsystem: CardExpert Technology GM204 [GeForce GTX 970] [10b0:13c2]
	Kernel driver in use: vfio-pci
	Kernel modules: nvidiafb, nouveau, nvidia_340
06:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
	Subsystem: CardExpert Technology GM204 High Definition Audio Controller [10b0:13c2]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel

If this has worked, the worst is over. 🙂

addendum:
After I have been tempted with the “Additional Driver Settings” and NVIDIA updates, my guest was unable to start.
lspci -nnk has shown that the former vfio-pci promised card had the “kernel driver in use: nvidia”. I was able to correct this by remapping the “Additional Driver Settings” explicitly to “Noveau open source”, and the host card with an former NVIDIA driver.


Step 4 – Create Guest System.

todo.


Step 5 – Passthrough the Hardware.

todo


Step 6 – Configure Guest System.

Performance tips:

hugepages
Pro: RAM can be used faster in the VM.
Con: The RAM size provided for Hugepages is never accessible to the host (even if the guest is not running).

Hyper-V enlightenments
Here I get unfortunately Error 43 by the Nvidia driver.

Error 43 fix for libvirt >= version 1.3.3

<features>
<hyperv>
...
<vendor_id state='on' value='whatever'/>
...
</hyperv>
...
<kvm>
<hidden state='on'/>
</kvm>
</features>

Quellen:
PCI-passthrough docs, Archlinux Wiki
Multiheaded Gaming using Ubuntu 14.04
VFIO Blog
KVM docs, Ubuntu Community
Reddit Post 1
Reddit Post 1

Leave a Reply