IOMMU Groups – What You Need to Consider

Summary

In this post I present some of the challenges you might face with IOMMU and provide tools to identify and perhaps solve the issues.

What is IOMMU and why do I need it?

In my tutorial on how to run Windows 10 on Linux using KVM with VGA Passthrough the first and most important hardware requirement is the support for IOMMU – VT-d in Intel jargon, AMD-v or SVM in AMD talk. But what does IOMMU support mean?

IOMMU – or input–output memory management unit – is a memory management unit (MMU) that connects a direct-memory-access–capable (DMA-capable) I/O bus to the main memory. The IOMMU maps a device-visible virtual address ( I/O virtual address or IOVA) to a physical memory address. In other words, it translates the IOVA into a real physical address.

In an ideal world, every device has its own IOVA address space and no two devices share the same IOVA. But in practice this is often not the case. Moreover, the PCI-Express (PCIe) specifications allow PCIe devices to communicate with each other directly, called peer-to-peer transactions, thereby escaping the IOMMU.

That is where PCI Access Control Services (ACS) are called to the rescue. ACS is able to tell whether or not these peer-to-peer transactions are possible between any two or more devices, and can disable them. ACS features are implemented within the CPU and the chipset.

Unfortunately the implementation of ACS varies greatly between different CPU or chip-set models. Some CPUs have good ACS, in other CPUs it’s outright unusable – see for example Xeon E3-1200, page 61. Note that Intel Xeon processors are usually an excellent choice for PCI/VGA passthrough, except perhaps this specific model.

For an inside out on IOMMU, see this excellent blog: https://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html.

How to determine IOMMU capabilities

If you already own a PC that you want to use for VGA passthrough, and IOMMU is supported and enabled (see my tutorial), you can check the ACS capabilities as follows:
sudo lspci -vv > lspci-vv.txt

Then open the file and search for “Access Control Services”:
gksudo xed lspci-vv.txt

Here an example from my system:
00:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode])
.
.
Capabilities: [110 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-

ACSCap specifies the ACS capabilities – every option ending with a + such as TransBlk+ is supported, the – indicates that this capability is not supported, for example EgressCtrl-.

ASCCtl shows the ACS capabilities that are enabled. We can manually enable a capability, but that shouldn’t be necessary.

The ACS capabilities listed above are for one device only, in this case the PCI bridge at 00:02.0. You will hopefully find multiple “Access Control Services” entries in your lspci-vv.txt file. Mine has 6 entries, including:

  • 3 PCI Express bridge root ports from the Intel 3930K CPU (see above);
  • 1 PCI Express virtual root port from the X79 chipset;
  • 2 PCI bridge ports from a PLX Technology, Inc. 8603 chip that resides on a SATA controller/USB3 board I added.

My X79 board and the Intel i7 3930K CPU provide good ACS capabilities.

The ultimate test, however, is the PCI device separation into independent IOMMU groups. To get a sorted list of IOMMU groups and their devices, enter:
for a in /sys/kernel/iommu_groups/*; do find $a -type l; done | sort --version-sort

The complete output on my i7-3930K X79 system is:
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/4/devices/0000:00:05.0
/sys/kernel/iommu_groups/4/devices/0000:00:05.2
/sys/kernel/iommu_groups/4/devices/0000:00:05.4
/sys/kernel/iommu_groups/5/devices/0000:00:11.0
/sys/kernel/iommu_groups/6/devices/0000:00:16.0
/sys/kernel/iommu_groups/7/devices/0000:00:19.0
/sys/kernel/iommu_groups/8/devices/0000:00:1a.0
/sys/kernel/iommu_groups/9/devices/0000:00:1b.0
/sys/kernel/iommu_groups/10/devices/0000:00:1c.0
/sys/kernel/iommu_groups/11/devices/0000:00:1c.1
/sys/kernel/iommu_groups/12/devices/0000:00:1c.2
/sys/kernel/iommu_groups/13/devices/0000:00:1c.3
/sys/kernel/iommu_groups/14/devices/0000:00:1c.4
/sys/kernel/iommu_groups/15/devices/0000:00:1c.7
/sys/kernel/iommu_groups/16/devices/0000:00:1d.0
/sys/kernel/iommu_groups/17/devices/0000:00:1e.0
/sys/kernel/iommu_groups/18/devices/0000:00:1f.0
/sys/kernel/iommu_groups/18/devices/0000:00:1f.2
/sys/kernel/iommu_groups/18/devices/0000:00:1f.3
/sys/kernel/iommu_groups/19/devices/0000:01:00.0
/sys/kernel/iommu_groups/19/devices/0000:01:00.1
/sys/kernel/iommu_groups/20/devices/0000:02:00.0
/sys/kernel/iommu_groups/20/devices/0000:02:00.1
/sys/kernel/iommu_groups/21/devices/0000:05:00.0
/sys/kernel/iommu_groups/21/devices/0000:06:04.0
/sys/kernel/iommu_groups/22/devices/0000:07:00.0
/sys/kernel/iommu_groups/23/devices/0000:08:00.0
/sys/kernel/iommu_groups/24/devices/0000:09:00.0
/sys/kernel/iommu_groups/25/devices/0000:0a:00.0
/sys/kernel/iommu_groups/26/devices/0000:0b:00.0
/sys/kernel/iommu_groups/27/devices/0000:0c:01.0
/sys/kernel/iommu_groups/28/devices/0000:0c:02.0
/sys/kernel/iommu_groups/29/devices/0000:0d:00.0
/sys/kernel/iommu_groups/30/devices/0000:0e:00.0
/sys/kernel/iommu_groups/31/devices/0000:ff:08.0
/sys/kernel/iommu_groups/31/devices/0000:ff:08.3
/sys/kernel/iommu_groups/31/devices/0000:ff:08.4
/sys/kernel/iommu_groups/32/devices/0000:ff:09.0
/sys/kernel/iommu_groups/32/devices/0000:ff:09.3
/sys/kernel/iommu_groups/32/devices/0000:ff:09.4
/sys/kernel/iommu_groups/33/devices/0000:ff:0a.0
/sys/kernel/iommu_groups/33/devices/0000:ff:0a.1
/sys/kernel/iommu_groups/33/devices/0000:ff:0a.2
/sys/kernel/iommu_groups/33/devices/0000:ff:0a.3
/sys/kernel/iommu_groups/34/devices/0000:ff:0b.0
/sys/kernel/iommu_groups/34/devices/0000:ff:0b.3
/sys/kernel/iommu_groups/35/devices/0000:ff:0c.0
/sys/kernel/iommu_groups/35/devices/0000:ff:0c.1
/sys/kernel/iommu_groups/35/devices/0000:ff:0c.2
/sys/kernel/iommu_groups/35/devices/0000:ff:0c.6
/sys/kernel/iommu_groups/35/devices/0000:ff:0c.7
/sys/kernel/iommu_groups/36/devices/0000:ff:0d.0
/sys/kernel/iommu_groups/36/devices/0000:ff:0d.1
/sys/kernel/iommu_groups/36/devices/0000:ff:0d.2
/sys/kernel/iommu_groups/36/devices/0000:ff:0d.6
/sys/kernel/iommu_groups/37/devices/0000:ff:0e.0
/sys/kernel/iommu_groups/37/devices/0000:ff:0e.1
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.0
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.1
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.2
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.3
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.4
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.5
/sys/kernel/iommu_groups/38/devices/0000:ff:0f.6
/sys/kernel/iommu_groups/39/devices/0000:ff:10.0
/sys/kernel/iommu_groups/39/devices/0000:ff:10.1
/sys/kernel/iommu_groups/39/devices/0000:ff:10.2
/sys/kernel/iommu_groups/39/devices/0000:ff:10.3
/sys/kernel/iommu_groups/39/devices/0000:ff:10.4
/sys/kernel/iommu_groups/39/devices/0000:ff:10.5
/sys/kernel/iommu_groups/39/devices/0000:ff:10.6
/sys/kernel/iommu_groups/39/devices/0000:ff:10.7
/sys/kernel/iommu_groups/40/devices/0000:ff:11.0
/sys/kernel/iommu_groups/41/devices/0000:ff:13.0
/sys/kernel/iommu_groups/41/devices/0000:ff:13.1
/sys/kernel/iommu_groups/41/devices/0000:ff:13.4
/sys/kernel/iommu_groups/41/devices/0000:ff:13.5
/sys/kernel/iommu_groups/41/devices/0000:ff:13.6

You can get more information on the devices inside the IOMMU groups using this command line script:
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU Group %s ' "$n"; lspci -nns "${d##*/}"; done;

Here an extract of what I get:
IOMMU Group 0 00:00.0 Host bridge [0600]: Intel Corporation Xeon E5/Core i7 DMI2 [8086:3c00] (rev 07)
IOMMU Group 10 00:1c.0 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 1 [8086:1d10] (rev b5)
IOMMU Group 11 00:1c.1 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 2 [8086:1d12] (rev b5)
IOMMU Group 12 00:1c.2 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 3 [8086:1d14] (rev b5)
IOMMU Group 13 00:1c.3 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 4 [8086:1d16] (rev b5)
IOMMU Group 14 00:1c.4 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 5 [8086:1d18] (rev b5)
IOMMU Group 15 00:1c.7 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 8 [8086:1d1e] (rev b5)
IOMMU Group 16 00:1d.0 USB controller [0c03]: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #1 [8086:1d26] (rev 05)
IOMMU Group 17 00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev a5)

For a tree view of the PCIe bus and devices, use:
lspci -t

This is what I get:

lspci -t
PCI tree output with lspci -t

Notice my 2 graphics cards:
+-02.0-[01]–+-00.0
|            \-00.1
+-03.0-[02]–+-00.0
|            \-00.1

Passing through a PCI device

Passing through PCI or VGA devices requires you to pass through all devices within an IOMMU group. The exception to this rule are PCI root devices that reside in the same IOMMU group with the device(s) we want to pass through. These root devices cannot be passed through as they often perform important tasks for the host. A number of (Intel) CPUs, usually consumer-grade CPUs with integrated graphics (IGD), share a root device in the same IOMMU group as the first PCIe 16x slot.

Let’s have a look at some of the IOMMU groups from the list above:

/sys/kernel/iommu_groups/19/devices/0000:01:00.0 – this is the Nvidia Quadro 2000 GPU in the first PCIe 16x port
/sys/kernel/iommu_groups/19/devices/0000:01:00.1 – this is the audio part of the Nvidia Quadro 2000 GPU

/sys/kernel/iommu_groups/20/devices/0000:02:00.0 – this is the Nvidia GTX 970 GPU in the second PCIe 16x port
/sys/kernel/iommu_groups/20/devices/0000:02:00.1 – this is the audio part of the Nvidia GTX 970 GPU

I’m passing through the GTX 970. Since this card and its audio function are the only devices in the IOMMU group, passthrough is a piece of cake.

What if there are other devices in my IOMMU group

If you want to pass through a graphics card or PCIe device and there are one or more other devices in that same IOMMU group, passthrough can become challenging. Below I’m referring to the graphics device, but the same holds true for a PCI device or card.

  1. The graphics card and one or more PCI root ports share the same IOMMU group. Pass through the graphics card and the audio part and leave the root port to the host – that should work. Here an example of root ports on an Intel Skylake system:
    /sys/kernel/iommu_groups/7/devices/0000:00:1c.0
    /sys/kernel/iommu_groups/7/devices/0000:00:1c.4
    /sys/kernel/iommu_groups/7/devices/0000:04:00.0
    /sys/kernel/iommu_groups/7/devices/0000:04:00.1

    Running lspci -nnk on this system reveals this information:
    00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #3 [8086:a112] (rev f1)
    Kernel driver in use: pcieport
    Kernel modules: shpchp
    00:1c.4 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #5 [8086:a114] (rev f1)
    Kernel driver in use: pcieport
    Kernel modules: shpchp 

    Besides these two root ports, 04:00.0 and 04:00.1 designate a graphics card residing in PCIe x16 slot 1 on the board.
    Passing through this graphics card should not pose a problem. But what if it does? See 2 below.

  2. You have other devices in your IOMMU group and cannot pass through the graphics card. Upgrade your kernel.As of this writing, Linux Mint 18.2 (Ubuntu 16.04) ships with a default kernel 4.4, but you can easily install a 4.8 or even 4.10 kernel. Newer kernels bring support for more chip-sets. Some chip-sets don’t offer ACS, but they may still honor device separation, which allows kernel developers to add quirks to support ACS functionality.

    Important: The above mentioned kernel version becomes quickly outdated, and I can’t and won’t keep updating this tutorial with every incremental kernel release. Use the Update Manager to see if there is a newer kernel available.

    Check your kernel version with:
    inxi -S

    System: Host: heiko Kernel: 4.10.0-27-generic x86_64 (64 bit)
    Desktop: MATE 1.18.0 Distro: Linux Mint 18.2 Sonya

    If you find you use an older kernel, here is the way to upgrade in Linux Mint:

    1. Open the Update Manager and select View -> Linux kernels:
      Update Manager
      View kernel options in Update Manager

       

    2. Select the most recent kernel and install:
      Update Manager
      Install latest kernel via Update Manager

      Then list the IOMMU groups and see if it made a difference. If yes, try again to pass through the GPU.

  3. Even with the latest kernel, there are still PCIe devices besides your graphics card. Move the graphics card to a different PCIe (16x) slot.Turn off your PC, unplug the power cable, and open the case. See if you can move the GPU to a different slot. Most modern motherboards have at least 2, if not 3 PCIe 16x and/or 8x long slots for graphics cards. Here the explanation:

    Each PCIe slot on your motherboard corresponds to a different PCI ID (BDF = Bus:Device.Function annotation). In the Skylake example above, the GPU is located at 04:00.0 and the sound part at 04:00.1. By moving your graphics card to a different slot it should be getting a different PCI ID, thus showing up within a different IOMMU group. The same is true for every PCIe card. Sometimes the easiest and best solution is to move around the cards on the motherboard.

    Once you “reshuffled” the cards, close the case and reconnect to power. Boot and list the PCI devices and IOMMU groups. Did it work?

  4. No matter what you tried, your system doesn’t have good device isolation. As a last option, apply the ACS override patch. For instructions on how to do that, see here.

Buying computer hardware

If your buying a new computer or planning to build one, take a careful look at the specifications. Here is a non-conclusive checklist:

  1. IOMMU support in the CPU: Intel VT-d or AMD SVM.
  2. IOMMU support in the motherboard / BIOS: check the specifications and manual on how to enable IOMMU!
  3. Discrete graphics card with UEFI support.
  4. CPU ACS support – how well is it implemented? Scan the Internet and the forums.
  5. When you made a hardware shortlist, check the Internet / forums for success stories.

 

 

 

1 thought on “IOMMU Groups – What You Need to Consider”

Leave a Reply