Giving special alignment/size constraints to the Linux PCI core?

Wed Feb 13 16:10:54 EST 2013

On Wednesday 13 February 2013, Jason Gunthorpe wrote:
> On Wed, Feb 13, 2013 at 06:53:14PM +0000, Arnd Bergmann wrote:
> 
> > > The standard answer is to leave appropriate gaps. My *guess* on this
> > > matter is that on x86 the gaps are left, as appropriate, by the boot
> > > firmware. Eg an ExpressCard slot will always have a window assigned to
> > > its bridge and Linux would typically not reassign it (or similar).
> > > 
> > > PCI core support for firmware-less embedded will someday need to do
> > > something similar, eg via a special DT attribute on hot plug capable
> > > ports.
> > 
> > I saw that the PCI core reserves 2MB memory space and 256 bytes of
> > I/O space per hotplug capable bridge by default, and you can
> > override
> 
> Haven't looked at how it determines what is hot plug
> capable.. Technically every PCI-E port is hot plug capable, it really
> depends on the specific board if a port can actually be hot plugged or
> not - so maybe that is what gets set in DT?

The "is_hotplug_bridge" flag that determines this gets set for PCIe
bridges with the PCI_EXP_SLTCAP_HPC (hot plug capable) bit set in the
PCI_EXP_SLTCAP word.

> > these at boot time if you need more. I wonder if this means that
> > we end up using two of the precious address space windows for each
> > unused root port to already map these at boot time, and it certainly
> > works for most adapters, but this does not seem better than assigning
> > static windows of the same size at boot time for each port.
> 
> If the PCI core programs the decoder on the bridge, then it will
> consume a window - however if there is nothing behind the bridge then
> leaving the brdige window disabled, but reserving the memory region is
> a sensible thing to do.
>
> I'm not sure what the state of the PCI core is today on this point,
> but it could be altered..

The problem I see with the current implementation is that it reserves
a fixed size window and does not reassign the window of the bridge
itself, only the devices below it, at least if I am reading the
code correctly. I have not tried this myself.

> Also the host driver can check the link status before consuming a
> window, no link = no window.

Right, that works. Even if the link is up, it might require only
I/O or memory windows, rather than always using both. 

> > > Just to circle back on this whole thread - Thomas's solution is pretty
> > > good, it covers pretty much all the use cases. I think it is a good
> > > place to start, and as the firmware-less 'drivers/pci/host' concept
> > > develops the right support will eventually come, as everyone is now
> > > aware of the need to control the host bridge aperture from the core
> > > PCI code.
> > 
> > I agree the solution is not all that bad, I just want to be convinced
> > that it actually has advantages over the simpler approaches.
> 
> Unfortunatelly my Marvell systems do not have oversubscribed mbus
> windows, so I can't really comment on this :( However I do use the
> hotplug capability in the current driver, so at least for me, it is
> important to not loose that when trying to solve the oversubcription.

One thing worth trying is probably to hack the driver to only use
a couple of the available windows and see what happens when you hotplug
one card into all the slots one at a time.

	Arnd