snps, dwmac interrupt storm (Was: ARC770: "unexpected IRQ trap at vector 00" during boot)
Alexandru Gagniuc
alex.g at adaptrum.com
Wed Aug 2 10:20:56 PDT 2017
On 08/01/2017 11:23 PM, Vineet Gupta wrote:
> On 08/02/2017 03:03 AM, Alex wrote:
>> On 07/25/2017 08:08 PM, Vineet Gupta wrote:
>> I have tried the workarouns I mentioned on top of linux 4.9.34, and it
>> works exactly as expected. however, on top of 4.13-rc3 [1], the story
>> is a lot different. As soon as I release the GMAC from reset, the boot
>> stops. I can single-step through JTAG, and see that the GMAC sends an
>> interrupt storm. The kernel doesn't have time to move on with the
>> dwmac initialization and register the interrupt, and that's that.
>
> I'm a bit confused here. Are you saying that your current patchset for
> ARC is broken on 4.13.x due to "something" while it was working with 4.9.
4.9: GOOD
4.13-rc3: BAD
>> I'd file this under both 'regression' and 'bug' categories.
>
> Sure - the question where is the bug/regression, is it in ARC port,
> driver updates or yet something else in the kernel.
Something else.
>> Not sure what changed under the hood from 4.9 to 4.13-rc3 to cause
>> such a drastically different behavior. I can't really do much else as
>> workarounds, since the GMAC registers are not writable while the GMAC
>> is in reset.
>
> We had a fair bit of churn in intc department in 4.10 and 4.11 but most
> of those were related to the IDU intc found only on HS38x cores, not on
> ARC700. To really narrow down the regression, perhaps try a dirty bisect
> trick (which works for me sometimes). Squash all the Adaptrum changes
> into 1 patch - I presume that same patch applies to 4.9 as to 4.13
> (otherwise u need to improvise). git bisect between 4.9 (good) and
> 4.13-rcx (bad) and patch -p1 < ur-patch at each stage.
I found the culprit, as evidenced in [Exhibit A]. I'm not really sure
how that code is designed to work, but I'm suspecting before the change,
the IRQ would get masked on the first hit, but now it's no longer masked.
I have reverted the patch in question on top of my 4.13 development
branch and I can confirm that the issue is resolved.
Alex
# [Exhibit A]: Git output after two hours of hardcore bisecting:
bf22ff45bed664aefb5c4e43029057a199b7070c is the first bad commit
commit bf22ff45bed664aefb5c4e43029057a199b7070c
Author: Jeffy Chen <jeffy.chen at rock-chips.com>
Date: Mon Jun 26 19:33:34 2017 +0800
genirq: Avoid unnecessary low level irq function calls
Check irq state in enable/disable/unmask/mask_irq to avoid unnecessary
low level irq function calls.
This has two advantages:
- Conditionals are faster than hardware access
- Solves issues with the underlying refcounting of the pinctrl
infrastructure
Suggested-by: Thomas Gleixner <tglx at linutronix.de>
Signed-off-by: Jeffy Chen <jeffy.chen at rock-chips.com>
Signed-off-by: Thomas Gleixner <tglx at linutronix.de>
Cc: tfiga at chromium.org
Cc: briannorris at chromium.org
Cc: dianders at chromium.org
Link:
http://lkml.kernel.org/r/1498476814-12563-2-git-send-email-jeffy.chen@rock-chips.com
:040000 040000 ec5072725f8be0a3906e949aa0172cb3e00729d6
27847e81e1c424a62938404fd48bea3c439d74c0 M kernel
More information about the linux-snps-arc
mailing list