[PATCH 10/18] arm64: Introduce FIQ support

Sun Feb 7 13:49:53 EST 2021

On Sun, Feb 7, 2021 at 4:40 PM Hector Martin 'marcan' <marcan at marcan.st> wrote:
> On 07/02/2021 21.25, Arnd Bergmann wrote:
> > On Sun, Feb 7, 2021 at 9:36 AM Hector Martin 'marcan' <marcan at marcan.st> wrote:
> >> On 07/02/2021 01.22, Arnd Bergmann wrote:
> >>> * In the fiq handler code, check if normal interrupts were enabled
> >>>     when the fiq hit. Normally they are enabled, so just proceed to
> >>>     handle the timer and ipi directly
> >>>
> >>> * if irq was disabled, defer the handling by doing a self-ipi
> >>>     through the aic's ipi method, and handle it from there
> >>>     when dealing with the next interrupt once interrupts get
> >>>     enabled.
> >>>
> >>> This would be similar to the soft-disable feature on powerpc, which
> >>> never actually turns off interrupts from regular kernel code but
> >>> just checks a flag in local_irq_enable that gets set when a
> >>> hardirq happened.
> >>
> >> Case #2 seems messy. In AIC, we'd have to either:
> >>
> >> * Disable FIQs, and hope that doesn't confuse any save/restore code
> >> going on, then set a flag and check it in *both* the IRQ and FIQ path
> >> since either might trigger depending on what happens next, or
> >> * Mask the relevant timer, which we'd then need to make sure does not
> >> confuse the timer code (Unmask it again when we fire the interrupt? But
> >> what if the timer code intended to mask it in the interim?)
> >
> > I'm not quite following here. The IRQ should be disabled the entire time
> > while handling that self-IPI and the timer top half code, so if we get
> > another FIQ while handling the timer from the IRQ path, it will lead
> > either yet another self-IPI or it will be ignored in case the previous timer
> > event has not been Acked yet. I would expect that both cases are
> > race-free here, the only time that the FIQ needs to be disabled is
> > while actually handling the FIQ. Did I miss something?
>
> FIQs are level-triggered, and there are only two* ways of masking them
> (that we know of): in the timer, or DAIF. That means that if we get a
> FIQ, we *must* do one of two things: either mask it in the timer
> register, or mask FIQs entirely. If we do neither of these, we get a FIQ
> storm.
>
> So if a timer FIQ fires while IRQs are disabled, and we can't call into
> the timer code (because IRQs were disabled, so we need to defer handling
> via the IPI), the only other options are to either poke the timer mask
> bit directly, or to mask FIQs. Neither seems particularly correct.

Ok, I had not realized the timer was level triggered. In case of the
timer, I suppose it could be either masked or acknowledged from the
fiq top-half handler when deferring to irq, but I agree that it means a
layering violation in either case.

What might still work is an approach where FIQ is normally enabled,
and local_irq_disable() leaves it on, while local_irq_enable() turns
it on regardless of the current state.

In this case, the fiq handler could run the timer function if interrupts
are enabled but just turn off fiqs when they are turned off, waiting
for the next local_irq_enable() to get us back in the state where
the handler can run.  Not sure if that would buy us anything though,
or if that still requires platform specific conditionals in common code.

> * An exception seems to be non-HV timer interrupts firing while we have
> a VM guest running (HCR_EL2.TGE=0). This causes a single FIQ, and no
> more, which suggests there is a mask bit for guest timer FIQs somewhere
> that gets automatically set when the FIQ is delivered to the CPU core.
> I've yet to find where this bit lives, I'll be doing a brute force sweep
> of system register space soon to see if I can find it, and if there is
> anything else useful near it.

Right. Maybe you can even find a bit that switches between FIQ and
IRQ mode for the timer, as that would solve the problem completely.
I think it's not that rare for irqchips to be configurable to either route
an interrupt one way or the other.

> >> Plus I don't know if the vector entry code and other scaffolding between
> >> the vector and the AIC driver would be happy with, effectively,
> >> recursive interrupts. This could work with a carefully controlled path
> >> to make sure it doesn't break things, but I'm not so sure about the
> >> current "just point FIQ and IRQ to the same place" approach here.
> >
> > If we do what I described above, the FIQ and IRQ entry would have
> > to be separate and only arrive in the same code path when calling
> > into drivers/clocksource/arm_arch_timer.c. It's not recursive there
> > because that part is only called when IRQ is disabled, and no IRQ
> > is being executed while the FIQ hits.
>
> Right, that's what i'm saying; we can't re-use the IRQ handler like Marc
> proposed, because I don't think that expects to be called reentrantly;
> we'd have to have a separate FIQ entry, but since it can be called with
> IRQs enabled and handle the FIQ in-line, it also needs to be able to
> *conditionally* behave like a normal IRQ handler. This level of
> complexity seems somewhat dubious, just to not maintain the FIQ mask bit
> synced. That's not just AIC code any more, it needs a bespoke FIQ vector
> and logic to decide whether IRQs are masked (call AIC to self-IPI
> without doing the usual IRQ processing) or unmasked (go through regular
> IRQ accounting and behave like an IRQ).
>
> Perhaps I'm misunderstanding what you're proposing here or how this
> would work :)

The way I had imagined it was to have a parallel set_handle_irq()
and set_handle_fiq() in the aic driver, which end up using the same
logic in the entry code to call into the driver. The code leading up
to that is all in assembler but isn't all that complex in the end, and
is already abstracted with macros to a large degree. For existing
machines that don't call set_handle_fiq() it could just end up in
either panic() or in WARN_ONCE() if an FIQ does happen
unexpectedly.

The aic_handle_fiq() function itself would be straightforward,
doing not much more than

       if (interrupts_enabled(ptregs))
             /* safe to call timer interrupt here, as interrupts are on */
             handle_domain_irq(aic->domain, AIC_TIMER_IRQ, regs);
       else
             /* need to defer until interrupts get re-enabled */
             aic_send_ipi(smp_processor_id(), TIMER_SELF_IPI);

Anyway, it's probably not worth pursuing this further if the timer
interrupt is level-triggered, as you explained above.

       Arnd