Neophyte questions about PCIe

Bjorn Helgaas helgaas at kernel.org
Wed Mar 8 07:17:24 PST 2017


On Tue, Mar 07, 2017 at 11:45:27PM +0100, Mason wrote:
> Hello,
> 
> I've been working with the Linux PCIe framework for a few weeks,
> and there are still a few things that remain unclear to me.
> I thought I'd group them in a single message.
> 
> 1) If I understand correctly, PCI defines 3 types of (address?) "spaces"
> 	- configuration
> 	- memory
> 	- I/O
> 
> I think PCI has its roots in x86, where there are separate
> instructions for I/O accesses and memory accesses (with MMIO
> sitting somewhere in the middle). I'm on ARMv7 which doesn't
> have I/O instructions AFAIK. I'm not sure what the I/O address
> space is used for in PCIe, especially since I was told that
> one may map I/O-type registers (in my understanding, registers
> for which accesses cause side effects) within mem space.

You're right about the three PCI address spaces.  Obviously, these
only apply to the *PCI* hierarchy.  The PCI host bridge, which is the
interface between the PCI hierarchy and the rest of the system (CPUs,
system RAM, etc.), generates these PCI config, memory, or I/O
transactions.

The host bridge may use a variety of mechanisms to translate a CPU
access into the appropriate PCI transaction.

  - PCI memory transactions: Generally the host bridge translates CPU
    memory accesses directly into PCI memory accesses, although it may
    translate the physical address from the CPU to a different PCI bus
    address, e.g., by truncating high-order address bits or adding a
    constant offset.

    As you mentioned, drivers use some flavor of ioremap() to set up
    mappings for PCI memory space, then they perform simple memory
    accesses to it.  There's no required PCI core wrapper and no
    locking in this path.

  - PCI I/O transactions: On x86, where the ISA supports "I/O"
    instructions, a host bridge generally forwards I/O accesses from
    the CPU directly to PCI.  Bridges for use on other arches may
    provide a bridge-specific way to convert a CPU memory access into
    a PCI I/O transaction, e.g., a CPU memory store inside a bridge
    window may be translated to a PCI I/O write transaction, with the
    PCI I/O address determined by the offset into the bridge window.

    Drivers use inb()/outb() to access PCI I/O space.  These are
    arch-specific wrappers that can use the appropriate mechanism for
    the arch and bridge.

    PCIe deprecates I/O space, and many bridges don't support it at
    all, so it's relatively unimportant.  Many PCI devices do make
    registers available in both I/O and memory space, but there's no
    spec requirement to do so.  Drivers for such devices would have to
    know about this as a device-specific detail.

  - PCI config transactions: The simplest mechanism is called ECAM
    ("Enhanced Configuration Access Method") and is required by the
    PCIe spec and also supported by some conventional PCI bridges.  A
    CPU memory access inside a bridge window is converted into a PCI
    configuration transaction.  The PCI bus/device/function
    information is encoded into the CPU physical memory address.

    Another common mechanism is for the host bridge to have an
    "address" register, where the CPU writes the PCI bus/device/
    function information, and a "data" register where the CPU reads or
    writes the configuration data.  This obviously requires locking
    around the address/data accesses.

    The PCI core and drivers use pci_read_config_*() wrappers to
    access config space.  These use the appropriate bridge-specific
    mechanism and do any required locking.

> 2) On my platform, there are two revisions of the PCIe controller.
> Rev1 muxes config and mem inside a 256 MB window, and doesn't support
> I/O space.
> Rev2 muxes all 3 spaces inside a 256 MB window.
> 
> Ard has stated that this model is not supported by Linux.
> AFAIU, the reason is that accesses may occur concurrently
> (especially on SMP systems). Thus tweaking a bit before
> the actual access necessarily creates a race condition.

Yes.

> I wondered if there might be (reasonable) software
> work-arounds, in your experience?

Muxing config and I/O space isn't a huge issue because they both use
wrappers that could do locking.  Muxing config and memory space is a
pretty big problem because memory accesses do not use a wrapper.

There's no pretty way of making sure no driver is doing memory
accesses during a config access.  Somebody already pointed out that
you'd have to make sure no other CPU could be executing a driver while
you're doing a config access.  I can't think of any better solution.

> 3) What happens if a device requires more than 256 MB of
> mem space? (Is that common? What kind of device? GPUs?)

It is fairly common to have PCI BARs larger than 256MB.

> Our controller supports a remapping "facility" to add an
> offset to the bus address. Is such a feature supported
> by Linux at all?  The problem is that this creates
> another race condition, as setting the offset register
> before an access may occur concurrently on two cores.
> Perhaps 256 MB is plenty on a 32-bit embedded device?

Linux certainly supports a constant offset between the CPU physical
address and the PCI bus address -- this is the offset described by
pci_add_resource_offset().

But it sounds like you're envisioning some sort of dynamic remapping,
and I don't see how that could work.  The PCI core needs to know the
entire host bridge window size up front, because that's how it assigns
BARs.  Since there's no wrapper for memory accesses, there's no
opportunity to change the remapping at the time of access.

> 4) The HW dev is considering the following fix.
> Instead of muxing the address spaces, provide smaller
> exclusive spaces. For example
> [0x5000_0000, 0x5400_0000] for config (64MB)
> [0x5400_0000, 0x5800_0000] for I/O (64MB)
> [0x5800_0000, 0x6000_0000] for mem (128MB)
> 
> That way, bits 26:27 implicitly select the address space
> 	00 = config
> 	01 = I/O
> 	1x = mem
> 
> This would be more in line with what Linux expects, right?
> Are these sizes acceptable? 64 MB config is probably overkill
> (we'll never have 64 devices on this board). 64 MB for I/O
> is probably plenty. The issue might be mem space?

Having exclusive spaces like that would be a typical approach.  The
I/O space seems like way more than you probably need, if you need it
at all.  There might be a few ancient devices that require I/O space,
but only you can tell whether you need to support those.

Same with memory space: if you restrict the set of devices you want to
support, you can restrict the amount of address space you need.  The
Sky Lake GPU on my laptop has a 256MB BAR, so even a single device
like that can require more than the 128MB you'd have with this map.

Bjorn



More information about the linux-arm-kernel mailing list