IOMMU Groups – What You Need to Consider

Summary

In this post I present some of the challenges you might face with IOMMU and provide tools to identify and perhaps solve the issues. Your best friend is the pciutils package and the lspci command (see here for examples).

What is IOMMU and why do I need it?

In my tutorial on how to run Windows 10 on Linux using KVM with VGA Passthrough the first and most important hardware requirement is the support for IOMMU – VT-d in Intel jargon, AMD-v or SVM in AMD talk. But what does IOMMU support mean?

IOMMU – or input–output memory management unit – is a memory management unit (MMU) that connects a direct-memory-access–capable (DMA-capable) I/O bus to the main memory. The IOMMU maps a device-visible virtual address ( I/O virtual address or IOVA) to a physical memory address. In other words, it translates the IOVA into a real physical address.

In an ideal world, every device has its own IOVA address space and no two devices share the same IOVA. But in practice this is often not the case. Moreover, the PCI-Express (PCIe) specifications allow PCIe devices to communicate with each other directly, called peer-to-peer transactions, thereby escaping the IOMMU.

That is where PCI Access Control Services (ACS) are called to the rescue. ACS is able to tell whether or not these peer-to-peer transactions are possible between any two or more devices, and can disable them. ACS features are implemented within the CPU and the chipset.

Unfortunately the implementation of ACS varies greatly between different CPU or chip-set models. Some CPUs have good ACS, in other CPUs it’s outright unusable – see for example Xeon E3-1200, page 61. Note that Intel Xeon processors are usually an excellent choice for PCI/VGA passthrough, except perhaps this specific model.

Spaceinvador One has produced a comprehensive, easy to follow video on IOMMU, showing configuration examples using unRAID (a commercial solution).

For an inside out on IOMMU, see this excellent blog: https://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html.

How to determine IOMMU capabilities

If you already own a PC that you want to use for VGA passthrough, and IOMMU is supported and enabled (see my tutorial), you can check the ACS capabilities as follows:
sudo lspci -vv > lspci-vv.txt

Then open the file and search for “Access Control Services”:
gksudo xed lspci-vv.txt or xed admin://lspci-vv.txt for Linux Mint 19 / Ubuntu 18.04 and above.

Here an example from my system:
00:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode])
.
.
Capabilities: [110 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-

ACSCap specifies the ACS capabilities – every option ending with a + such as TransBlk+ is supported, the – indicates that this capability is not supported, for example EgressCtrl-.

ASCCtl shows the ACS capabilities that are enabled. We can manually enable a capability, but that shouldn’t be necessary.

The ACS capabilities listed above are for one device only, in this case the PCI bridge at 00:02.0. You will hopefully find multiple “Access Control Services” entries in your lspci-vv.txt file. Mine has 6 entries, including:

  • 3 PCI Express bridge root ports from the Intel 3930K CPU (see above);
  • 1 PCI Express virtual root port from the X79 chipset;
  • 2 PCI bridge ports from a PLX Technology, Inc. 8603 chip that resides on a SATA controller/USB3 board I added.

My X79 board and the Intel i7 3930K CPU provide good ACS capabilities.

The ultimate test, however, is the PCI device separation into independent IOMMU groups. To get a sorted list of IOMMU groups and their devices, enter:
for a in /sys/kernel/iommu_groups/*; do find $a -type l; done | sort --version-sort

The complete output on my i7-3930K X79 system is:
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/4/devices/0000:00:05.0
/sys/kernel/iommu_groups/4/devices/0000:00:05.2
/sys/kernel/iommu_groups/4/devices/0000:00:05.4
/sys/kernel/iommu_groups/5/devices/0000:00:11.0
/sys/kernel/iommu_groups/6/devices/0000:00:16.0
/sys/kernel/iommu_groups/7/devices/0000:00:19.0
/sys/kernel/iommu_groups/8/devices/0000:00:1a.0
/sys/kernel/iommu_groups/9/devices/0000:00:1b.0
/sys/kernel/iommu_groups/10/devices/0000:00:1c.0
/sys/kernel/iommu_groups/11/devices/0000:00:1c.1
/sys/kernel/iommu_groups/12/devices/0000:00:1c.2
/sys/kernel/iommu_groups/13/devices/0000:00:1c.3
/sys/kernel/iommu_groups/14/devices/0000:00:1c.4
/sys/kernel/iommu_groups/15/devices/0000:00:1c.7
/sys/kernel/iommu_groups/16/devices/0000:00:1d.0
/sys/kernel/iommu_groups/17/devices/0000:00:1e.0
/sys/kernel/iommu_groups/18/devices/0000:00:1f.0
/sys/kernel/iommu_groups/18/devices/0000:00:1f.2
/sys/kernel/iommu_groups/18/devices/0000:00:1f.3
/sys/kernel/iommu_groups/19/devices/0000:01:00.0
/sys/kernel/iommu_groups/19/devices/0000:01:00.1
/sys/kernel/iommu_groups/20/devices/0000:02:00.0
/sys/kernel/iommu_groups/20/devices/0000:02:00.1
/sys/kernel/iommu_groups/21/devices/0000:05:00.0
/sys/kernel/iommu_groups/21/devices/0000:06:04.0
/sys/kernel/iommu_groups/22/devices/0000:07:00.0
/sys/kernel/iommu_groups/23/devices/0000:08:00.0
/sys/kernel/iommu_groups/24/devices/0000:09:00.0
/sys/kernel/iommu_groups/25/devices/0000:0a:00.0
/sys/kernel/iommu_groups/26/devices/0000:0b:00.0
/sys/kernel/iommu_groups/27/devices/0000:0c:01.0
/sys/kernel/iommu_groups/28/devices/0000:0c:02.0
/sys/kernel/iommu_groups/29/devices/0000:0d:00.0
/sys/kernel/iommu_groups/30/devices/0000:0e:00.0
/sys/kernel/iommu_groups/31/devices/0000:ff:08.0
/sys/kernel/iommu_groups/31/devices/0000:ff:08.3
/sys/kernel/iommu_groups/31/devices/0000:ff:08.4
/sys/kernel/iommu_groups/32/devices/0000:ff:09.0
/sys/kernel/iommu_groups/32/devices/0000:ff:09.3
/sys/kernel/iommu_groups/32/devices/0000:ff:09.4
/sys/kernel/iommu_groups/33/devices/0000:ff:0a.0
/sys/kernel/iommu_groups/33/devices/0000:ff:0a.1
/sys/kernel/iommu_groups/33/devices/0000:ff:0a.2
/sys/kernel/iommu_groups/33/devices/0000:ff:0a.3
/sys/kernel/iommu_groups/34/devices/0000:ff:0b.0
/sys/kernel/iommu_groups/34/devices/0000:ff:0b.3
/sys/kernel/iommu_groups/35/devices/0000:ff:0c.0
/sys/kernel/iommu_groups/35/devices/0000:ff:0c.1
/sys/kernel/iommu_groups/35/devices/0000:ff:0c.2
/sys/kernel/iommu_groups/35/devices/0000:ff:0c.6
/sys/kernel/iommu_groups/35/devices/0000:ff:0c.7
/sys/kernel/iommu_groups/36/devices/0000:ff:0d.0
/sys/kernel/iommu_groups/36/devices/0000:ff:0d.1
/sys/kernel/iommu_groups/36/devices/0000:ff:0d.2
/sys/kernel/iommu_groups/36/devices/0000:ff:0d.6
/sys/kernel/iommu_groups/37/devices/0000:ff:0e.0
/sys/kernel/iommu_groups/37/devices/0000:ff:0e.1
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.0
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.1
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.2
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.3
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.4
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.5
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.6
/sys/kernel/iommu_groups/39/devices/0000:ff:10.0
/sys/kernel/iommu_groups/39/devices/0000:ff:10.1
/sys/kernel/iommu_groups/39/devices/0000:ff:10.2
/sys/kernel/iommu_groups/39/devices/0000:ff:10.3
/sys/kernel/iommu_groups/39/devices/0000:ff:10.4
/sys/kernel/iommu_groups/39/devices/0000:ff:10.5
/sys/kernel/iommu_groups/39/devices/0000:ff:10.6
/sys/kernel/iommu_groups/39/devices/0000:ff:10.7
/sys/kernel/iommu_groups/40/devices/0000:ff:11.0
/sys/kernel/iommu_groups/41/devices/0000:ff:13.0
/sys/kernel/iommu_groups/41/devices/0000:ff:13.1
/sys/kernel/iommu_groups/41/devices/0000:ff:13.4
/sys/kernel/iommu_groups/41/devices/0000:ff:13.5
/sys/kernel/iommu_groups/41/devices/0000:ff:13.6

You can get more information on the devices inside the IOMMU groups using this command line script:
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU Group %s ' "$n"; lspci -nns "${d##*/}"; done;

Here an extract of what I get:
IOMMU Group 0 00:00.0 Host bridge [0600]: Intel Corporation Xeon E5/Core i7 DMI2 [8086:3c00] (rev 07)
IOMMU Group 10 00:1c.0 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 1 [8086:1d10] (rev b5)
IOMMU Group 11 00:1c.1 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 2 [8086:1d12] (rev b5)
IOMMU Group 12 00:1c.2 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 3 [8086:1d14] (rev b5)
IOMMU Group 13 00:1c.3 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 4 [8086:1d16] (rev b5)
IOMMU Group 14 00:1c.4 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 5 [8086:1d18] (rev b5)
IOMMU Group 15 00:1c.7 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 8 [8086:1d1e] (rev b5)
IOMMU Group 16 00:1d.0 USB controller [0c03]: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #1 [8086:1d26] (rev 05)
IOMMU Group 17 00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev a5)

For a tree view of the PCIe bus and devices, use:
lspci -t

This is what I get:

lspci -t
PCI tree output with lspci -t

Notice my 2 graphics cards:
+-02.0-[01]–+-00.0
|            \-00.1
+-03.0-[02]–+-00.0
|            \-00.1

Passing through a PCI device

Passing through PCI or VGA devices requires you to pass through all devices within an IOMMU group. The exception to this rule are PCI root devices that reside in the same IOMMU group with the device(s) we want to pass through. These root devices cannot be passed through as they often perform important tasks for the host. A number of (Intel) CPUs, usually consumer-grade CPUs with integrated graphics (IGD), share a root device in the same IOMMU group as the first PCIe 16x slot.

Let’s have a look at some of the IOMMU groups from the list above:

/sys/kernel/iommu_groups/19/devices/0000:01:00.0 – this is the Nvidia Quadro 2000 GPU in the first PCIe 16x port
/sys/kernel/iommu_groups/19/devices/0000:01:00.1 – this is the audio part of the Nvidia Quadro 2000 GPU

/sys/kernel/iommu_groups/20/devices/0000:02:00.0 – this is the Nvidia GTX 970 GPU in the second PCIe 16x port
/sys/kernel/iommu_groups/20/devices/0000:02:00.1 – this is the audio part of the Nvidia GTX 970 GPU

I’m passing through the GTX 970. Since this card and its audio function are the only devices in the IOMMU group, passthrough is a piece of cake.

What if there are other devices in my IOMMU group?

If you want to pass through a graphics card or PCIe device and there are one or more other devices in that same IOMMU group, passthrough can become challenging. Below I’m referring to the graphics device, but the same goes for any PCI device.

  1. The graphics card and one or more PCI root ports share the same IOMMU group. Pass through the graphics card and the audio part and leave the root port to the host – that should work. Here an example of root ports on an Intel Skylake system:
    /sys/kernel/iommu_groups/7/devices/0000:00:1c.0
    /sys/kernel/iommu_groups/7/devices/0000:00:1c.4
    /sys/kernel/iommu_groups/7/devices/0000:04:00.0
    /sys/kernel/iommu_groups/7/devices/0000:04:00.1

    To see what kind of device is associated with PCI slot 00:1c.0, use the following command: lspci -s 00:1c.0
    Both 00:1c.0 and 00:1c.4 are root ports and cannot and need not to be passed to the guest!
    You can retrieve more information using the  lspci -nnk command:
    00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #3 [8086:a112] (rev f1)
    Kernel driver in use: pcieport
    Kernel modules: shpchp
    00:1c.4 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #5 [8086:a114] (rev f1)
    Kernel driver in use: pcieport
    Kernel modules: shpchpBesides these two root ports, 04:00.0 and 04:00.1 designate a graphics card residing in PCIe x16 slot 1 on the board.Passing through this graphics card should not pose a problem. But what if it does, or what if those other devices aren’t root ports? See 2 below.
  2. Some vendors had bugs in their motherboard BIOS that have been fixed in newer releases, for example the 3.30 to 4.40 BIOS update on Ryzen/AMD X390 boards. Get the latest BIOS for your motherboard and update (be careful, this procedure can potentially brick your motherboard).
  3. You have other devices in your IOMMU group and cannot pass through the graphics card. Upgrade your kernel. As of this writing, Linux Mint 19 (Ubuntu 18.04) ships with default kernel 4.15, but you can easily install a newer kernel. Newer kernels bring support for more chip-sets, especially when you use the latest hardware. Some chip-sets don’t offer ACS, but they may still honor device separation, which allows kernel developers to add quirks to support ACS functionality.
    Important: The above mentioned kernel version becomes quickly outdated, and I can’t and won’t keep updating this tutorial with every incremental kernel release. Use the Update Manager to see if there is a newer kernel available.
    Check your kernel version with:
    inxi -S
    System: Host: heiko Kernel: 4.10.0-27-generic x86_64 (64 bit)

    Desktop: MATE 1.18.0 Distro: Linux Mint 18.2 SonyaIf you find you use an older kernel, here is the way to upgrade in Linux Mint:

    1. Open the Update Manager and select View -> Linux kernels:

      Update Manager
      View kernel options in Update Manager
    2. Select the most recent kernel and install:
      Update Manager
      Install latest kernel via Update Manager

      Then list the IOMMU groups and see if it made a difference. If yes, try again to pass through the GPU.

  4. Even with the latest kernel, there are still PCIe devices besides your graphics card. Move the graphics card to a different PCIe (16x) slot.
    Turn off your PC, unplug the power cable, and open the case. See if you can move the GPU to a different slot. Most modern motherboards have at least 2, if not 3 PCIe 16x and/or 8x long slots for graphics cards. Here the explanation:
    Each PCIe slot on your motherboard corresponds to a different PCI ID (BDF = Bus:Device.Function annotation). In the Skylake example above, the GPU is located at 04:00.0 and the sound part at 04:00.1. By moving your graphics card to a different slot it should be getting a different PCI ID, thus showing up within a different IOMMU group. The same is true for every PCIe card. Sometimes the easiest and best solution is to move around the cards on the motherboard.
    Once you “reshuffled” the cards, close the case and reconnect to power. Boot and list the PCI devices and IOMMU groups. Did it work?
  5. No matter what you tried, your system doesn’t have good device isolation. As a last resort, apply the ACS override patch (see instructions here), or – much more convenient – use the latest kernel builds with the ACS override patch provided by Max Ehrlich. The builds are supplied as .deb files based on Ubuntu and can be installed via the packet manager.
    After you installed the patched kernel, you must activate the ACS override by inserting “pcie_acs_override=downstream” after the …iommu=on option in /etc/default/grub, then run update-grub.
    A word of caution: The ACS override patch introduces a security hole that may perhaps be exploited. Also be aware of the security risks involved in using software sources outside the official repositories such as the kernel builds mentioned above.
    Before patching the kernel, or installing the  patched kernel via .deb file, do the following:
    I. Make a complete backup, including operating system and user data.
    II. Make sure you have at least 1 working kernel image other than the kernel you are patching. The simplest way to install another kernel is via Update Manager -> View -> Linux kernels and install a recent kernel (see screen shots under 2. above).

Buying computer hardware

If your buying a new computer or planning to build one, take a careful look at the specifications. Here is a non-conclusive checklist:

  1. IOMMU support in the CPU: Intel VT-d or AMD SVM.
  2. IOMMU support in the motherboard / BIOS: check the specifications and manual on how to enable IOMMU! See also https://passthroughpo.st/vfio-increments/ for a hardware parts list used with VGA passthrough.
  3. Discrete graphics card with UEFI support.
  4. CPU ACS support – how well is it implemented? See link above and search the Internet.
  5. When you made a hardware shortlist, check the Internet / forums for success stories.

 

 

 

One thought on “IOMMU Groups – What You Need to Consider”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.