[PATCH v2 10/14] arm64/nmi: Manage masking for superpriority interrupts along with DAIF

Lorenzo Pieralisi lpieralisi at kernel.org
Tue Dec 13 00:37:56 PST 2022


On Mon, Dec 12, 2022 at 02:03:33PM +0000, Mark Brown wrote:
> On Thu, Dec 08, 2022 at 06:19:02PM +0100, Lorenzo Pieralisi wrote:
> 
> > I think I found a nasty spot. We are currently not handling ALLINT in
> > arch_local_irq_enable/disable(). The issue I am facing is that we might
> > end up preempting in IRQ context with ALLINT set in the exception path
> > - arm64_preempt_schedule_irq() - which means we are running with all
> > IRQs masked (that's normal; what's not normal is that local_irq_enable()
> > does not clear ALLINT, see below).
> 
> Right, and handling ALLINT in arch_local_irq_enable/disable() isn't
> exactly ideal since it means that whenever we mask interrupts we also
> mask NMIs which somewhat reduces the value.

Understood but ALLINT should be cleared before scheduling on the
exception path that leads to preemption - where it is done to
be seen.

> > When we schedule (preempt_schedule_irq()) we do require a
> > local_irq_enable() to enable IRQs; ALLINT is still set, so
> > local_irq_enable() does not do what is expected so we are calling
> > __schedule() with IRQs disabled, which does not seem right.
> 
> > Now we need to debate what the fix for this can be but nonetheless
> > it is something to be addressed.
> 
> A first pass suggests that we should be handling this like we do for
> other preemptions and returning early from arm64_preempt_schedule_irq()
> if ALLINT is masked.  If we are handling a regular IRQ then ALLINT will
> be unmasked and we'll call into preempt_schedule_irq(), if we're
> handling a NMI then ALLINT will still be masked so we don't attempt to
> schedule.  I've pushed out a change which does this but not yet properly
> tested it.

Yes that's what should happen (actually if we are handling an NMI we
should not even get to the point where a decision about preemption is
made el1_interrupt() just returns).

> > Clearing and setting ALLINT in arch_local_irq_enable()/disable()
> > seems to solve the issue (now I moved on to debugging something
> > else, will post the outcome here because this fix does not seem
> > to fix the issue completely or I am hitting another bug).
> 
> Do you have any specifics on how you're seeing problems?  You did
> mention boot stalls offline but I've not been able to to reproduce this
> locally in a way that I can identify (based on your mail now I've made
> sure I've got preemption enabled).

defconfig, barebone rootfs, boot stalls (because we are scheduling with
IRQs off and there is nothing clearing ALLINT in the preemption path
so system hangs).

I don't know why you can't reproduce it don't know if it is the Kconfig
or file system configuration (or the FVP params - for this to show up
FEAT_NMI must obviously be enabled - I am testing the branch Marc posted
so that I can test the vGIC patches but this is definitely not a vGIC
bug).

Thanks,
Lorenzo



More information about the linux-arm-kernel mailing list