Possible regression between 4.9 and 4.13
Greg Kroah-Hartman
gregkh at linuxfoundation.org
Wed Aug 30 02:06:33 PDT 2017
On Wed, Aug 30, 2017 at 10:55:37AM +0200, Mason wrote:
> On 30/08/2017 08:02, Greg Kroah-Hartman wrote:
>
> > To get back to the original issue here, the hardware seems to have died,
> > the driver stops talking to it, and all is good. The "regression" here
> > is that we now properly can determine that the hardware is crap.
>
> Before 4.12, when I unplugged my USB3 Flash drive, Linux would
> detect a few "Uncorrected Non-Fatal errors" via AER, but it was
> still possible to plug the drive back in.
>
> Since 4.12, once I unplug the drive, the whole USB3 card is marked
> as dead (all 4 ports), and I can no longer plug anything in (not even
> the USB2 drive that didn't have any issues, IIRC).
>
> It seems a bit premature to "mark as dead" something that remains
> functional, doesn't it?
I agree, but if the device sends all ones, it's a good indication it is
really dead, right? Or something is wrong with it.
> Disclaimer, there are many variables in this setup, and I've only
> tested a small fraction of the problem space: only one system,
> only one USB3 board, only one USB3 Flash drive.
Did you ever happen to narrow this down to a single git commit using
'git bisect'? I can't remember what happened in the beginning of this
thread...
> > So, how do you think we should proceed, delay a bit longer before saying
> > the device is gone? How long is "long enough"? How many bus errors are
> > we allowed to tolerate (hint, the PCI spec says none...)
> >
> > Maybe someone wants to get to the root problem here, why is the hardware
> > suddenly reporting all 1s?
>
> I'm afraid I won't be able to make any progress on this front,
> unless I can get my hands on a PCIe packet analyzer.
Odds of that happening are pretty rare, right? I've never even seen one
of those...
thanks,
greg k-h
More information about the linux-arm-kernel
mailing list