[PATCH v3 2/2] ACPI: APEI: Filter the PCI MCFG address with an arch-agnostic method

Bjorn Helgaas helgaas at kernel.org
Tue Oct 26 13:47:22 PDT 2021


On Tue, Oct 26, 2021 at 05:16:47PM +0800, Xuesong Chen wrote:
> On 26/10/2021 07:37, Bjorn Helgaas wrote:

> > My point was that when ECAM is implemented correctly, a CPU does a
> > single MMIO load to do a PCI config read and a single MMIO store to do
> > a PCI config write.  In that case there no need for any locking, so
> > there's no need for APEI to reserve those resources.
> 
> Ah, got it. That means the PCI ECAM has a implicit mutual exclusion with EINJ
> if the hardware implemention is correct, so we can remove the MCFG from
> the APEI's safely.

Well, not quite.  ECAM doesn't *need* mutual exclusion.  Single loads
and stores are atomic by definition.

> > I think apei_resources_request() should continue to reserve MCFG areas
> > on tegra194 and xgene, but it does not need to reserve them on other
> > ARM64 platforms.
> 
> As a summary: we need to reserve the MCFG areas on those platforms with a
> quirk ECAM implementation since there's no lockless method to access the
> configuration space, on other platforms we don't need to reserve the MCFG
> resources (so can remove it safely).
> 
> So we need to add another patch to handle the case of tegra194 and xgene...
> I will try to figure it out. 

I looked through these again and found another problem case (thunder).
Here are my notes from my research.

Normal ECAM users require no device-specific support.  The platform
supplies an MCFG table, the generic code works, no mutual exclusion is
required, and APEI doesn't need to reserve the MCFG areas.

The problem cases are platforms that supply an MCFG table but require
some device-specific workarounds.  We can identify these because they
have quirks in pci-mcfg.c.  Here are the existing quirks and the
pci_ecam_ops structs they supply:

  AL_ECAM             al_pcie_ops                 # OK
  QCOM_ECAM32         pci_32b_ops                 # OK
  HISI_QUAD_DOM       hisi_pcie_ops               # OK
  THUNDER_PEM_QUIRK   thunder_pem_ecam_ops        # problem
  THUNDER_PEM_QUIRK   thunder_pem_ecam_ops        # problem
  THUNDER_ECAM_QUIRK  pci_thunder_ecam_ops        # OK
  tegra               tegra194_pcie_ops           # problem
  XGENE_V1_ECAM_MCFG  xgene_v1_pcie_ecam_ops      # problem
  XGENE_V2_ECAM_MCFG  xgene_v2_pcie_ecam_ops      # problem
  ALTRA_ECAM_QUIRK    pci_32b_read_ops            # OK

The ones marked "OK" have .map_bus(), .read(), and .write() methods
that need no mutual exclusion because they boil down to just a single
MMIO load or store.  These are fine and there shouldn't be a problem
if an EINJ action accesses the ECAM space.

The others do require mutual exclusion:

  - thunder_pem_ecam_ops: thunder_pem_config_read() calls
    thunder_pem_bridge_read(), which does a writeq() to PEM_CFG_RD
    followed by a readq().  The writeq() and readq() must be atomic to
    avoid corruption.

  - tegra194_pcie_ops: tegra194_map_bus() programs the ATU.  This and
    the subsequent ECAM read/write must be atomic.

  - xgene_v1_pcie_ecam_ops and xgene_v2_pcie_ecam_ops:
    xgene_pcie_map_bus() sets the RTID.  This and the subsequent ECAM
    read/write must be atomic.

I had to look at all these ops individually to find them, so I don't
see an easy way to identify these problem cases at run-time.

I personally would not have an issue with having APEI try to reserve
the MCFG regions for any platform that has an MCFG quirk.  That would
prevent the al, qcom, hisi, thunder-ecam, and altra drivers from using
EINJ even though it would probably be safe for them.  But we already
know those platforms are not really ACPI-compliant, so ...

Bjorn



More information about the linux-arm-kernel mailing list