Possible regression due to "tick: broadcast: Prevent livelock from event handler"
Thomas Gleixner
tglx at linutronix.de
Fri Jul 3 02:23:12 PDT 2015
On Fri, 3 Jul 2015, Geert Uytterhoeven wrote:
> Hi Simon,
>
> On Fri, Jul 3, 2015 at 4:40 AM, Simon Horman <horms at verge.net.au> wrote:
> > I have observed what appears to be a regression while testing next-20150702
> > which seems to be caused by 2951d5c031a3 ("tick: broadcast: Prevent
> > livelock from event handler").
> >
> > The problem manifests on the emev2/kzm9d board as per the boot log below.
> >
> > The problem manifests when booting using the shmobile_defconfig,
> > which uses multiplatform and enables all devices using DT.
> >
> > The problem does not appear to always manifest but anecdotally it
> > seems to manifest more often of late (yes, I know that is vague).
>
> > hctosys: unable to open rtc device (rtc0)
> >
> > The boot hangs here.
> > The next line should be:
> >
> > smsc911x 20000000.ethernet eth0: SMSC911x/921x identified at 0xc8880000, IRQ: 33
>
> As you can reproduce it, can you please try enabling lockdep debugging?
Just looking at the em_sti driver. It calls clk_prepare/unprepare from
interrupt disabled regions ...
But that's not the problem at hand I think. The above commit is moving
the call to the event handler on the local cpu out of the broadcast
lock region to prevent a live lock. The only real change is the
timing.
Before:
bc_handler()
lock(bc_lock);
call_local_handler();
send_ipis();
reprogramm_bc_device();
unlock(bc_lock);
After:
bc_handler()
lock(bc_lock);
send_ipis();
reprogramm_bc_device();
unlock(bc_lock);
call_local_handler();
As this runs in hard interrupt context with interrupts disabled, I
really cannot figure out how that makes a difference.
Can you add some debugging to figure out whether the broadcast timer
interrupt still fires?
Thanks,
tglx
More information about the linux-arm-kernel
mailing list