Possible regression between 4.9 and 4.13

Mason slash.tmp at free.fr
Thu Aug 31 02:39:39 PDT 2017


On 30/08/2017 11:06, Greg Kroah-Hartman wrote:

> On Wed, Aug 30, 2017 at 10:55:37AM +0200, Mason wrote:
>
>> On 30/08/2017 08:02, Greg Kroah-Hartman wrote:
>>
>>> To get back to the original issue here, the hardware seems to have died,
>>> the driver stops talking to it, and all is good.  The "regression" here
>>> is that we now properly can determine that the hardware is crap.
>>
>> Before 4.12, when I unplugged my USB3 Flash drive, Linux would
>> detect a few "Uncorrected Non-Fatal errors" via AER, but it was
>> still possible to plug the drive back in.
>>
>> Since 4.12, once I unplug the drive, the whole USB3 card is marked
>> as dead (all 4 ports), and I can no longer plug anything in (not even
>> the USB2 drive that didn't have any issues, IIRC).
>>
>> It seems a bit premature to "mark as dead" something that remains
>> functional, doesn't it?
> 
> I agree, but if the device sends all ones, it's a good indication it is
> really dead, right?  Or something is wrong with it.

I wouldn't call it dead if I can plug the drive back in, and have
it working... But I agree that something fishy is happening...

>> Disclaimer, there are many variables in this setup, and I've only
>> tested a small fraction of the problem space: only one system,
>> only one USB3 board, only one USB3 Flash drive.
> 
> Did you ever happen to narrow this down to a single git commit using
> 'git bisect'?  I can't remember what happened in the beginning of this
> thread...

Mathias pointed out d9f11ba9f107aa335091ab8d7ba5eea714e46e8b

>>> So, how do you think we should proceed, delay a bit longer before saying
>>> the device is gone?  How long is "long enough"?  How many bus errors are
>>> we allowed to tolerate (hint, the PCI spec says none...)
>>>
>>> Maybe someone wants to get to the root problem here, why is the hardware
>>> suddenly reporting all 1s?
>>
>> I'm afraid I won't be able to make any progress on this front,
>> unless I can get my hands on a PCIe packet analyzer.
> 
> Odds of that happening are pretty rare, right?  I've never even seen one
> of those...

I had a "Summit T24 Analyzer" on my desk a few months ago, but I was getting
strange results, and the knowledgeable people in my company were not available
at the time.

http://teledynelecroy.com/protocolanalyzer/protocoloverview.aspx?seriesid=445

Regards.



More information about the linux-arm-kernel mailing list