[EXT] Re: [PATCH v2] PCI: aardvark: Implement workaround for PCIe Completion Timeout

Pali Rohár pali at kernel.org
Mon Oct 3 14:14:12 PDT 2022


Lorenzo, is something more needed for this patch? As it workarounds
crashing it is really needed to have it in mainline and backports.

On Wednesday 28 September 2022 14:05:10 Elad Nachman wrote:
> Reviewed-by: Elad Nachman <enachman at marvell.com>
> 
> Thanks,
> 
> Elad.
> 
> -----Original Message-----
> From: Pali Rohár <pali at kernel.org> 
> Sent: Monday, September 26, 2022 3:35 PM
> To: Elad Nachman <enachman at marvell.com>
> Cc: Thomas Petazzoni <thomas.petazzoni at bootlin.com>; Lorenzo Pieralisi <lpieralisi at kernel.org>; Bjorn Helgaas <bhelgaas at google.com>; Krzysztof Wilczyński <kw at linux.com>; Rob Herring <robh at kernel.org>; linux-pci at vger.kernel.org; linux-arm-kernel at lists.infradead.org; linux-kernel at vger.kernel.org; Gregory Clement <gregory.clement at bootlin.com>; Marek Behún <kabel at kernel.org>; Remi Pommarel <repk at triplefau.lt>; Xogium <contact at xogium.me>; Tomasz Maciej Nowak <tmn505 at gmail.com>
> Subject: [EXT] Re: [PATCH v2] PCI: aardvark: Implement workaround for PCIe Completion Timeout
> 
> External Email
> 
> ----------------------------------------------------------------------
> Hello Elad, could you please review this patch? I have implemented it according your instructions, including that full memory barrier as you described.
> 
> On Tuesday 02 August 2022 14:38:16 Pali Rohár wrote:
> > Marvell Armada 3700 Functional Errata, Guidelines, and Restrictions 
> > document describes in erratum 3.12 PCIe Completion Timeout (Ref #: 
> > 251), that PCIe IP does not support a strong-ordered model for inbound posted vs.
> > outbound completion.
> > 
> > As a workaround for this erratum, DIS_ORD_CHK flag in Debug Mux 
> > Control register must be set. It disables the ordering check in the 
> > core between Completions and Posted requests received from the link.
> > 
> > Marvell also suggests to do full memory barrier at the beginning of 
> > aardvark summary interrupt handler before calling interrupt handlers 
> > of endpoint drivers in order to minimize the risk for the race 
> > condition documented in the Erratum between the DMA done status 
> > reading and the completion of writing to the host memory.
> > 
> > More details about this issue and suggested workarounds are in discussion:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_l
> > inux-2Dpci_BN9PR18MB425154FE5019DCAF2028A1D5DB8D9-40BN9PR18MB4251.namp
> > rd18.prod.outlook.com_t_-23u&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNT
> > LEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=bjgkhgPgOjqCEsbHYHONCZMiFDX72
> > MztWaE0AvWBktQVn3zKEDtUdn02Kx_KJ14B&s=SToGsDGEObwbZGilVtVZPyME8jNiRgrq
> > 4SDYvqqT0TA&e=
> > 
> > It was reported that enabling this workaround fixes instability issues 
> > and "Unhandled fault" errors when using 60 GHz WiFi 802.11ad card with 
> > Qualcomm
> > QCA6335 chip under significant load which were caused by interrupt 
> > status stuck in the outbound CMPLT queue traced back to this erratum.
> > 
> > This workaround fixes also kernel panic triggered after some minutes 
> > of usage 5 GHz WiFi 802.11ax card with Mediatek MT7915 chip:
> > 
> >     Internal error: synchronous external abort: 96000210 [#1] SMP
> >     Kernel panic - not syncing: Fatal exception in interrupt
> > 
> > Signed-off-by: Thomas Petazzoni <thomas.petazzoni at bootlin.com>
> > Signed-off-by: Pali Rohár <pali at kernel.org>
> > Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host controller 
> > driver")
> > Cc: stable at vger.kernel.org
> > ---
> >  drivers/pci/controller/pci-aardvark.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/pci/controller/pci-aardvark.c 
> > b/drivers/pci/controller/pci-aardvark.c
> > index 060936ef01fe..3ae8a85ec72e 100644
> > --- a/drivers/pci/controller/pci-aardvark.c
> > +++ b/drivers/pci/controller/pci-aardvark.c
> > @@ -210,6 +210,8 @@ enum {
> >  };
> >  
> >  #define VENDOR_ID_REG				(LMI_BASE_ADDR + 0x44)
> > +#define DEBUG_MUX_CTRL_REG			(LMI_BASE_ADDR + 0x208)
> > +#define     DIS_ORD_CHK				BIT(30)
> >  
> >  /* PCIe core controller registers */
> >  #define CTRL_CORE_BASE_ADDR			0x18000
> > @@ -558,6 +560,11 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
> >  		PCIE_CORE_CTRL2_TD_ENABLE;
> >  	advk_writel(pcie, reg, PCIE_CORE_CTRL2_REG);
> >  
> > +	/* Disable ordering checks, workaround for erratum 3.12 "PCIe completion timeout" */
> > +	reg = advk_readl(pcie, DEBUG_MUX_CTRL_REG);
> > +	reg |= DIS_ORD_CHK;
> > +	advk_writel(pcie, reg, DEBUG_MUX_CTRL_REG);
> > +
> >  	/* Set lane X1 */
> >  	reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG);
> >  	reg &= ~LANE_CNT_MSK;
> > @@ -1581,6 +1588,9 @@ static irqreturn_t advk_pcie_irq_handler(int irq, void *arg)
> >  	struct advk_pcie *pcie = arg;
> >  	u32 status;
> >  
> > +	/* Full memory barrier (ARM dsb sy), workaround for erratum 3.12 "PCIe completion timeout" */
> > +	mb();
> > +
> >  	status = advk_readl(pcie, HOST_CTRL_INT_STATUS_REG);
> >  	if (!(status & PCIE_IRQ_CORE_INT))
> >  		return IRQ_NONE;
> > --
> > 2.20.1
> > 



More information about the linux-arm-kernel mailing list