Possible regression between 4.9 and 4.13

Tue Aug 29 23:02:37 PDT 2017

On Wed, Aug 30, 2017 at 01:53:10AM +0200, Lukas Wunner wrote:
> On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote:
> > This tight check was originally done to detect pci hotplug removed
> > hosts as soon as possible.
> 
> In Mason's case, the parent of the XHCI controller isn't a hotplug port,
> see this lspci output:
> 
> https://www.spinics.net/lists/linux-usb/msg160010.html
> 
> Please check is_hotplug_bridge in the parent's struct pci_dev before
> assuming that the XHCI controller was unplugged.

How can you guarantee that this is set on some systems?  Will it be set
on cardbus devices?  What about on a "normal" system where I can just go
and yank out a PCI card at will?

I don't think this is a valid thing to check, and again, why are we
arguing this point?  It's been this way since the 1990's, this isn't a
new thing...

To get back to the original issue here, the hardware seems to have died,
the driver stops talking to it, and all is good.  The "regression" here
is that we now properly can determine that the hardware is crap.

So, how do you think we should proceed, delay a bit longer before saying
the device is gone?  How long is "long enough"?  How many bus errors are
we allowed to tolerate (hint, the PCI spec says none...)

Maybe someone wants to get to the root problem here, why is the hardware
suddenly reporting all 1s?

thanks,

greg k-h