CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine

Bjorn Helgaas helgaas at kernel.org
Tue Jan 17 14:22:29 PST 2017


[+cc David]

On Tue, Jan 17, 2017 at 09:02:58PM +0000, Russell King - ARM Linux wrote:
> On Tue, Jan 17, 2017 at 07:34:14PM +0000, Russell King - ARM Linux wrote:
> > Uwe, can you try:
> > 
> > setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \
> > 	0x50.w=0x60
> > 
> > and see whether it remains alive (you can check by reading the root
> > register 0x52.w - bit 12 should be set once bit 11 clears again.
> 
> For reference, this I got wrong...
> 
> 0xf1041a04 bit 0 indicates link status (0 = link up, 1 = link down).
> 
> > If that's successful, maybe setting the common clock bit on the PCIe
> > device is what's causing the problem, in which case:
> > 
> > setpci -s 02:00.0 0x80.w=0x40
> > setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \
> > 	0x50.w=0x60
> 
> Having worked with Uwe over IRC, it seems that any request to retrain
> causes the link to go down, either with or without the common clock bit
> set:
> 
> # setpci -s 2.0 0x50.w=0x60
> # setpci -s 2.0 0x52.w
> 0011
> # memtool md 0xf1041a04+4
> f1041a04: 00010201
> ... reboot ...
> # setpci -s 2.0 0x50.w=0x20
> # memtool md 0xf1041a04+4
> f1041a04: 00010201
> 
> which doesn't point towards ASPM itself, but the problem is caused by
> a side effect of ASPM's setup code which always triggers a retrain.
> 
> Bit 5 in that register is documented (at least in the Armada 370 docs
> and Armada XP docs I have) as:
> 
> 5  RetrnLnk  RW    Retrain Link
>              0x0   This bit forces the device to initiate link retraining.
>                    Always returns 0 when read.
>                    NOTE: If configured as an Endpoint, this field is
>                    reserved and has no effect.
> 
> Bjorn, are you aware of similar situations where a request for the PCIe
> link to be retrained causes it to fail?

The only one that comes to mind is this patch from David (CC'd) that
avoids ASPM-related retrains when we know the link doesn't support ASPM:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e53f9a28bee3

Side note: it looks like we don't use the recommended retrain
algorithm in the implementation note about avoiding race conditions in 
PCIe r3.0, sec 7.8.7.



More information about the linux-arm-kernel mailing list