Why do we check for "link-up" in *_pcie_valid_device()?

Lorenzo Pieralisi lorenzo.pieralisi at arm.com
Tue Jan 2 03:37:50 PST 2018


On Fri, Dec 22, 2017 at 11:28:15AM -0600, Bjorn Helgaas wrote:
> On Fri, Dec 22, 2017 at 01:02:28PM +0000, Bharat Kumar Gogada wrote:
> > Bjorn wrote:
> >> In the PCI config access path, the *_pcie_valid_device() functions
> >> in the dwc, altera, rockchip, and xilinx drivers all check whether
> >> the link is up.
> >> 
> >> I think this is racy because the link may go down after we check but
> >> before we perform the config access.
> >> 
> >> What would blow up if we removed the *_pcie_link_up() checks?
> >> 
> >> I'd like to either remove the checks or add comments about why the
> >> race is acceptable.  If we've covered this before, I apologize.
> >> Adding a comment will keep me from pestering you about this again in
> >> the future.
> 
> > In both Xilinx driver cases when link is down, hardware responds by
> > AXI DECERR/SLVERR status which causes an exception, synchronous
> > external abort to CPU.  This causes system to hang, so we need this
> > check for both of our drivers.  We will add comments. 
> 
> This is a problem, and checking whether the link is up is a workaround
> but not a real solution.  That means your system may hang if the link
> happens to go down at the wrong time.
> 
> A real solution would be to handle the synchronous external abort
> so it doesn't cause a system hang.

I still have no idea why checking (in a broken way) that the link
is up for _every_ given config access is a solution.

If, say, the link is down in xilinx_pcie_init_port(), what would bring
it up or, worded differently, why checking that the link is up for every
config access instead of host bridge probe time make this code any safer
?

It is not safe anyway, it would be good to understand though why the code
was written this way so that we can change it appropriately.

Thanks,
Lorenzo



More information about the Linux-rockchip mailing list