imx8mp pci hang during init
Maciej W. Rozycki
macro at orcam.me.uk
Tue Aug 29 03:58:02 PDT 2023
Tim,
> It seems to me that pci quirks require knowing the device so don't
> help until you've established a link and can get to config space, or
> perhaps this means the switch needs to be defined in DT so that a dt
> compatible could be used for the quirk?
This is why I took a different approach with my a89c82249c37 ("PCI: Work
around PCIe link training failures"). Initially as a regular quirk
applied to all devices (i.e. matching on PCI_ANY_ID:PCI_ANY_ID) and then,
following Bjorn's suggestion, invoked directly from `pci_device_add' and
`pcie_wait_for_link_delay'.
> Does the PCIe specification specify that link training should start
> with the highest possible speed then downgrade? I find that most of
> the other PCI host controller drivers I've looked at all work this
> way. I have only found the force gen2 first behavior in pci-imx6.c and
> pcie-fu740.c. Maybe a dt property to force gen2 first is needed to
> resolve this.
It works the other way round. Link is always established at 2.5GT/s and
once successful the endpoints send each other information, the so called
"training sets", about their capabilities, including speeds supported.
Then they switch to the highest speed supported within the Target Link
Speed (TLS) setting in the Link Control 2 register of both ends. If there
are reliability issues at the higher rate, the endpoints are supposed to
reduce the link speed. Reducing the speed, both by clamping with TLS and
in the case of reliability issues, is always done by removing said speed
from the list reported in the respective device's training set.
I don't know what's causing some devices to fail to switch to the higher
speed when unclamped with TLS and yet to switch successfully when first
clamped with TLS and then the clamping removed. In principle unclamping
by hand should just mimic what happens in the unclamped case: the other
endpoint sees a higher speed advertised, so both endpoints switch to it.
I suppose the hardware state machine is just tough to get right and doing
things by hand prevents the broken ones from getting into an odd state due
to unfortunate timing or whatever. Unfortunately the device manufacturers
involved declined to comment.
Maciej
More information about the linux-arm-kernel
mailing list