[RFC PATCH v3 3/3] PCI/ACPI: hisi: Add ACPI support for HiSilicon SoCs Host Controllers

Gabriele Paoloni gabriele.paoloni at huawei.com
Sat Feb 27 01:00:31 PST 2016


Hi Bjorn, Lorenzo

Many Thanks for your replies and suggestions

> -----Original Message-----
> From: linux-pci-owner at vger.kernel.org [mailto:linux-pci-
> owner at vger.kernel.org] On Behalf Of Bjorn Helgaas
> Sent: 25 February 2016 19:59
> To: Lorenzo Pieralisi
> Cc: Gabriele Paoloni; 'Mark Rutland'; Guohanjun (Hanjun Guo); Wangzhou
> (B); liudongdong (C); Linuxarm; qiujiang; 'bhelgaas at google.com';
> 'arnd at arndb.de'; 'tn at semihalf.com'; 'linux-pci at vger.kernel.org';
> 'linux-kernel at vger.kernel.org'; xuwei (O); 'linux-
> acpi at vger.kernel.org'; 'jcm at redhat.com'; zhangjukuo; Liguozhu
> (Kenneth); 'linux-arm-kernel at lists.infradead.org'
> Subject: Re: [RFC PATCH v3 3/3] PCI/ACPI: hisi: Add ACPI support for
> HiSilicon SoCs Host Controllers
> 
> On Thu, Feb 25, 2016 at 12:07:50PM +0000, Lorenzo Pieralisi wrote:
> > On Thu, Feb 25, 2016 at 03:01:19AM +0000, Gabriele Paoloni wrote:
> >
> > [...]
> >
> > > > I think the relevant spec is the PCI Firmware Spec, r3.0, sec
> 4.1.2.
> > > > Note 2 in that section says the address range of an MMCFG region
> > > > must be reserved by declaring a motherboard resource, i.e.,
> included
> > > > in the _CRS of a PNP0C02 or similar device.
> > >
> > > I had a look a this. So yes the specs says that we should use the
> > > PNP0C02 device if MCFG is not supported.
> >
> > AFAIK, PNP0C02 is a resource reservation mechanism, the spec says
> that
> > MCFG regions must be reserved using PNP0C02, even though its
> > current usage on x86 is a bit unfathomable to me, in particular
> > in relation to MCFG resources retrieved for hotpluggable bridges (ie
> > through _CBA, which I think consider the MCFG region as reserved
> > by default, regardless of PNP0c02):
> >
> > see pci_mmcfg_check_reserved() arch/x86/pci/mmconfig-shared.c

Yes I checked this and it seems to check if an area of memory from
MCFG is overlapping with any area of memory specified by PNP0C02 
_CRS...

However (maybe I am wrong) it looks to me that this part works
independently of the PNP0c02 driver. It seems that goes directly
to walk the ACPI namespace and look for PNP0C02 HID; as it finds it,
it checks the range of memory specified in the _CRS method and see
if it overlaps with the MCFG resource...am I missing something? 

If my interpretation is correct, couldn't we just modify 
pci_mmconfig_map_resource() in the latest Nowicki patchset and add
a similar check before insert_resource_conflict() is called?

On the other side HiSilicon host bridge quirks could use the address
retrieved by the _CRS method of PNP0C02 for our root complex config
rd/wr...?
 
> 
> I don't know how _CBA-related resources would be reserved.  I haven't
> personally worked with any host bridges that supply _CBA, so I don't
> know whether or how they handled it.
> 
> I think the spec intent was that the Consumer/Producer bit (Extended
> Address Space Descriptor, General Flags Bit[0], see ACPI spec sec
> 6.4.3.5.4) would be used.  Resources such as ECAM areas would be
> marked "Consumer", meaning the bridge consumed that space itself, and
> windows passed down to the PCI bus would be marked "Producer".
> 
> But BIOSes didn't use that bit consistently, so we couldn't rely on
> it.  I verified experimentally that Windows didn't pay attention to
> that bit either, at least for DWord descriptors:
> https://bugzilla.kernel.org/show_bug.cgi?id=15701
> 
> It's conceivable that we could still use that bit in Extended Address
> Space descriptors, or maybe some hack like pay attention if the bridge
> has _CBA, or some such.  Or maybe a BIOS could add a PNP0C02 device
> along with the PNP0A03 device that uses _CBA, with the PNP0C02 _CRS
> describing the ECAM area referenced by _CBA.  Seeems hacky no matter
> how we slice it.

Well about this I don't know much but, having looked at the bugzilla
and considering the current mechanism used by pci_mmcfg_check_reserved()
I have the feeling that this last one is easier to implement and it seems
the one currently used (in mmconfig-shared.c )

Cheers

Gab

> 
> > Have a look at drivers/pnp/system.c for PNP0c02
> >
> > > So probably I can use acpi_get_devices("PNP0C02",...) to retrieve
> it
> > > from the quirk match function, I will look into this...
> > >
> > > >
> > > > > On the other side, since this is an exception only for the
> config
> > > > > space address of our host controller (as said before all the
> buses
> > > > > below the root one support ECAM), I think that it is right to
> put
> > > > > this address as a device specific data (in fact the rest of the
> > > > > config space addresses will be parsed from MCFG).
> > > >
> > > > A kernel with no support for your host controller (and thus no
> > > > knowledge of its _DSD) should still be able to operate the rest
> of the
> > > > system correctly.  That means we must have a generic way to learn
> what
> > > > address space is consumed by the host controller so we don't try
> to
> > > > assign it to other devices.
> > >
> > > This is something I don't understand much...
> > > Are you talking about a scenario where we have a Kernel image
> compiled
> > > without our host controller support and running on our platform?
> >
> > I *think* the point here is that your host controller config space
> should be
> > reserved through PNP0c02 so that the kernel will reserve it through
> the
> > generic PNP0c02 driver even if your host controller driver (and
> related
> > _DSD) is not supported in the kernel.
> 
> Right.  Assume you have two top-level devices:
> 
>   PNP0A03  PCI host bridge
>     _CRS     describes windows
>     ????     describes ECAM space consumed
>   PNPxxxx  another ACPI device, currently disabled
>     _PRS     specifies possible resource settings, may specify no
> restrictions
>     _SRS     assign resources and enable device
>     _CRS     empty until device enabled
> 
> When the OS enables PNPxxxx, it must first assign resources to it
> using _PRS and _SRS.  We evaluate _PRS to find out what the addresses
> PNPxxxx can support.  This tells us things like how wide the address
> decoder is, the size of the region required, and any alignment
> restrictions -- basically the same information we get by sizing a PCI
> BAR.
> 
> Now, how do we assign space for PNPxxxx?  In a few cases, _PRS has
> only a few specific possibilities, e.g., an x86 legacy serial port
> that can be at 0x3f8 or 0x2f8.  But in general, _PRS need not impose
> any restrictions.
> 
> So in general the OS can use any space that can be routed to PNPxxxx.
> If there's an upstream bridge, it may define windows that restrict the
> possibilities.  But in this case, there *is* no upstream bridge, so
> the possible choices are the entire physical address space of the
> platform, except for other things that are already allocated: RAM, the
> _CRS settings for other ACPI devices, things reserved by the E820
> table (at least on x86), etc.
> 
> If PNP0A03 consumes address space for ECAM, that space must be
> reported *somewhere* so the OS knows not to place PNPxxxx there.  This
> reporting must be generic (not device-specific like _DSD).  The ACPI
> core (not drivers) is responsible for managing this address space
> because:
> 
>   a) the OS is not guaranteed to have drivers for all devices, and
> 
>   b) even it *did* have drivers for all devices, the PNPxxxx space may
>   be assigned before drivers are initialized.
> 
> > I do not understand how PNP0c02 works, currently, by the way.
> >
> > If I read x86 code correctly, the unassigned PCI bus resources are
> > assigned in arch/x86/pci/i386.c (?)
> fs_initcall(pcibios_assign_resources),
> > with a comment:
> >
> > /**
> >  * called in fs_initcall (one below subsys_initcall),
> >  * give a chance for motherboard reserve resources
> >  */
> >
> > Problem is, motherboard resources are requested through (?):
> >
> > drivers/pnp/system.c
> >
> > which is also initialized at fs_initcall, so it might be called after
> > core x86 code reassign resources, defeating the purpose PNP0c02 was
> > designed for, namely, request motherboard regions before resources
> > are assigned, am I wrong ?
> 
> I think you're right.  This is a long-standing screwup in Linux.
> IMHO, ACPI resources should be parsed and reserved by the ACPI core,
> before any PCI resource management (since PCI host bridges are
> represented in ACPI).  But historically PCI devices have enumerated
> before ACPI got involved.  And the ACPI core doesn't really pay
> attention to _CRS for most devices (with the exception of PNP0C02).
> 
> IMO the PNP0C02 code in drivers/pnp/system.c should really be done in
> the ACPI core for all ACPI devices, similar to the way the PCI core
> reserves BAR space for all PCI devices, even if we don't have drivers
> for them.  I've tried to fix this in the past, but it is really a
> nightmare to unravel everything.
> 
> Because the ACPI core doesn't reserve resources for the _CRS of all
> ACPI devices, we're already vulnerable to the problem of placing a
> device on top of another ACPI device.  We don't see problems because
> on x86, at least, most ACPI devices are already configured by the BIOS
> to be enabled and non-overlapping.  But x86 has the advantage of
> having extensive test coverage courtesy of Windows, and as long as
> _CRS has the right stuff in it, we at least have the potential of
> fixing problems in Linux.
> 
> If the platform doesn't report resource usage correctly on ARM, we may
> not find problems (because we don't have the Windows test suite) and
> if we have resource assignment problems because _CRS is lacking, we'll
> have no way to fix them.
> 
> > As per last Tomasz's patchset, we claim and assign unassigned PCI
> > resources upon ACPI PCI host bridge probing (which happens at
> > subsys_initcall time, courtesy of ACPI current code); at that time
> the
> > kernel did not even register the PNP0c02 driver
> (drivers/pnp/system.c)
> > (it does that at fs_initcall). On the other hand, we insert MCFG
> > regions into the resource tree upon MCFG parsing, so I do not
> > see why we need to rely on PNP0c02 to do that for us (granted, the
> > mechanism is part of the PCI fw specs, which are x86 centric anyway
> > ie we can't certainly rely on Int15 e820 to detect reserved memory
> > on ARM :D)
> >
> > There is lots of legacy x86 here and Bjorn definitely has more
> > visibility into that than I have, the ARM world must understand
> > how this works to make sure we have an agreement.
> 
> As you say, there is lots of unpleasant x86 legacy here.  Possibly ARM
> has a chance to clean this up and do it more sanely; I'm not sure
> whether it's feasible to reverse the ACPI/PCI init order there or not.
> 
> Rafael, any thoughts on this whole thing?
> 
> Bjorn
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



More information about the linux-arm-kernel mailing list