[PATCH v3 2/2] ACPI: APEI: Filter the PCI MCFG address with an arch-agnostic method

Xuesong Chen xuesong.chen at linux.alibaba.com
Tue Oct 26 22:29:49 PDT 2021



On 27/10/2021 04:47, Bjorn Helgaas wrote:
> On Tue, Oct 26, 2021 at 05:16:47PM +0800, Xuesong Chen wrote:
>> On 26/10/2021 07:37, Bjorn Helgaas wrote:
> 
>>> My point was that when ECAM is implemented correctly, a CPU does a
>>> single MMIO load to do a PCI config read and a single MMIO store to do
>>> a PCI config write.  In that case there no need for any locking, so
>>> there's no need for APEI to reserve those resources.
>>
>> Ah, got it. That means the PCI ECAM has a implicit mutual exclusion with EINJ
>> if the hardware implemention is correct, so we can remove the MCFG from
>> the APEI's safely.
> 
> Well, not quite.  ECAM doesn't *need* mutual exclusion.  Single loads
> and stores are atomic by definition.
> 

OK, because ECAM config access has intrinsic atomic primitive, so no need to
reserve those ECAM config space resources for APEI, that's why commit d91525eb8ee6 
("ACPI, EINJ: Enhance error injection tolerance level") fix make sense... 

>>> I think apei_resources_request() should continue to reserve MCFG areas
>>> on tegra194 and xgene, but it does not need to reserve them on other
>>> ARM64 platforms.
>>
>> As a summary: we need to reserve the MCFG areas on those platforms with a
>> quirk ECAM implementation since there's no lockless method to access the
>> configuration space, on other platforms we don't need to reserve the MCFG
>> resources (so can remove it safely).
>>
>> So we need to add another patch to handle the case of tegra194 and xgene...
>> I will try to figure it out. 
> 
> I looked through these again and found another problem case (thunder).
> Here are my notes from my research.
> 
> Normal ECAM users require no device-specific support.  The platform
> supplies an MCFG table, the generic code works, no mutual exclusion is
> required, and APEI doesn't need to reserve the MCFG areas.
> 
> The problem cases are platforms that supply an MCFG table but require
> some device-specific workarounds.  We can identify these because they
> have quirks in pci-mcfg.c.  Here are the existing quirks and the
> pci_ecam_ops structs they supply:
> 
>   AL_ECAM             al_pcie_ops                 # OK
>   QCOM_ECAM32         pci_32b_ops                 # OK
>   HISI_QUAD_DOM       hisi_pcie_ops               # OK
>   THUNDER_PEM_QUIRK   thunder_pem_ecam_ops        # problem
>   THUNDER_PEM_QUIRK   thunder_pem_ecam_ops        # problem
>   THUNDER_ECAM_QUIRK  pci_thunder_ecam_ops        # OK
>   tegra               tegra194_pcie_ops           # problem
>   XGENE_V1_ECAM_MCFG  xgene_v1_pcie_ecam_ops      # problem
>   XGENE_V2_ECAM_MCFG  xgene_v2_pcie_ecam_ops      # problem
>   ALTRA_ECAM_QUIRK    pci_32b_read_ops            # OK
> 
> The ones marked "OK" have .map_bus(), .read(), and .write() methods
> that need no mutual exclusion because they boil down to just a single
> MMIO load or store.  These are fine and there shouldn't be a problem
> if an EINJ action accesses the ECAM space.
> 
> The others do require mutual exclusion:
> 
>   - thunder_pem_ecam_ops: thunder_pem_config_read() calls
>     thunder_pem_bridge_read(), which does a writeq() to PEM_CFG_RD
>     followed by a readq().  The writeq() and readq() must be atomic to
>     avoid corruption.
> 
>   - tegra194_pcie_ops: tegra194_map_bus() programs the ATU.  This and
>     the subsequent ECAM read/write must be atomic.
> 
>   - xgene_v1_pcie_ecam_ops and xgene_v2_pcie_ecam_ops:
>     xgene_pcie_map_bus() sets the RTID.  This and the subsequent ECAM
>     read/write must be atomic.
> 
> I had to look at all these ops individually to find them, so I don't
> see an easy way to identify these problem cases at run-time.
> 
> I personally would not have an issue with having APEI try to reserve
> the MCFG regions for any platform that has an MCFG quirk.  That would
> prevent the al, qcom, hisi, thunder-ecam, and altra drivers from using
> EINJ even though it would probably be safe for them.  But we already
> know those platforms are not really ACPI-compliant, so ...

OK, understood. Since those platforms are not really ACPI-compliant, so
we can unify all the quirks together. Let me send an inital solution about
this for your review and see if there's room for further improvement...

Thanks,
Xuesong

> 
> Bjorn
> 



More information about the linux-arm-kernel mailing list