imx8mp pci hang during init

Maciej W. Rozycki macro at orcam.me.uk
Tue Aug 29 03:58:02 PDT 2023


Tim,

> It seems to me that pci quirks require knowing the device so don't
> help until you've established a link and can get to config space, or
> perhaps this means the switch needs to be defined in DT so that a dt
> compatible could be used for the quirk?

 This is why I took a different approach with my a89c82249c37 ("PCI: Work 
around PCIe link training failures").  Initially as a regular quirk 
applied to all devices (i.e. matching on PCI_ANY_ID:PCI_ANY_ID) and then, 
following Bjorn's suggestion, invoked directly from `pci_device_add' and 
`pcie_wait_for_link_delay'.

> Does the PCIe specification specify that link training should start
> with the highest possible speed then downgrade? I find that most of
> the other PCI host controller drivers I've looked at all work this
> way. I have only found the force gen2 first behavior in pci-imx6.c and
> pcie-fu740.c. Maybe a dt property to force gen2 first is needed to
> resolve this.

 It works the other way round.  Link is always established at 2.5GT/s and 
once successful the endpoints send each other information, the so called 
"training sets", about their capabilities, including speeds supported.  
Then they switch to the highest speed supported within the Target Link 
Speed (TLS) setting in the Link Control 2 register of both ends.  If there 
are reliability issues at the higher rate, the endpoints are supposed to 
reduce the link speed.  Reducing the speed, both by clamping with TLS and 
in the case of reliability issues, is always done by removing said speed 
from the list reported in the respective device's training set.

 I don't know what's causing some devices to fail to switch to the higher 
speed when unclamped with TLS and yet to switch successfully when first 
clamped with TLS and then the clamping removed.  In principle unclamping 
by hand should just mimic what happens in the unclamped case: the other 
endpoint sees a higher speed advertised, so both endpoints switch to it.  
I suppose the hardware state machine is just tough to get right and doing 
things by hand prevents the broken ones from getting into an odd state due 
to unfortunate timing or whatever.  Unfortunately the device manufacturers 
involved declined to comment.

  Maciej



More information about the linux-arm-kernel mailing list