[PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()

Fri Oct 10 02:48:15 PDT 2025

On Fri, 10 Oct 2025 09:30:11 +0100,
Ulf Hansson <ulf.hansson at linaro.org> wrote:
> 
> On Mon, 6 Oct 2025 at 17:55, Marc Zyngier <maz at kernel.org> wrote:
> >
> > On Fri, 03 Oct 2025 16:02:44 +0100,
> > Ulf Hansson <ulf.hansson at linaro.org> wrote:
> > >
> > > To add support for keeping track of whether there may be a pending IPI
> > > scheduled for a CPU or a group of CPUs, let's implement
> > > cpus_has_pending_ipi() for arm64.
> > >
> > > Note, the implementation is intentionally lightweight and doesn't use any
> > > additional lock. This is good enough for cpuidle based decisions.
> > >
> > > Signed-off-by: Ulf Hansson <ulf.hansson at linaro.org>
> > > ---
> > >  arch/arm64/kernel/smp.c | 20 ++++++++++++++++++++
> > >  1 file changed, 20 insertions(+)
> > >
> > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > index 68cea3a4a35c..dd1acfa91d44 100644
> > > --- a/arch/arm64/kernel/smp.c
> > > +++ b/arch/arm64/kernel/smp.c
> > > @@ -55,6 +55,8 @@
> > >
> > >  #include <trace/events/ipi.h>
> > >
> > > +static DEFINE_PER_CPU(bool, pending_ipi);
> > > +
> > >  /*
> > >   * as from 2.5, kernels no longer have an init_tasks structure
> > >   * so we need some other way of telling a new secondary core
> > > @@ -1012,6 +1014,8 @@ static void do_handle_IPI(int ipinr)
> > >
> > >       if ((unsigned)ipinr < NR_IPI)
> > >               trace_ipi_exit(ipi_types[ipinr]);
> > > +
> > > +     per_cpu(pending_ipi, cpu) = false;
> > >  }
> > >
> > >  static irqreturn_t ipi_handler(int irq, void *data)
> > > @@ -1024,10 +1028,26 @@ static irqreturn_t ipi_handler(int irq, void *data)
> > >
> > >  static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> > >  {
> > > +     unsigned int cpu;
> > > +
> > > +     for_each_cpu(cpu, target)
> > > +             per_cpu(pending_ipi, cpu) = true;
> > > +
> >
> > Why isn't all of this part of the core IRQ management? We already
> > track things like timers, I assume for similar reasons. If IPIs have
> > to be singled out, I'd rather this is done in common code, and not on
> > a per architecture basis.
> 
> The idea was to start simple, avoid running code for architectures
> that don't seem to need it, by using this opt-in and lightweight
> approach.

If this stuff is remotely useful, then it is useful to everyone, and I
don't see the point in littering the arch code with it. We have plenty
of buy-in features that can be selected by an architecture and ignored
by others if they see fit.

> 
> I guess we could do this in generic IRQ code too. Perhaps making it
> conditional behind a Kconfig, if required.
> 
> >
> > >       trace_ipi_raise(target, ipi_types[ipinr]);
> > >       arm64_send_ipi(target, ipinr);
> > >  }
> > >
> > > +bool cpus_has_pending_ipi(const struct cpumask *mask)
> > > +{
> > > +     unsigned int cpu;
> > > +
> > > +     for_each_cpu(cpu, mask) {
> > > +             if (per_cpu(pending_ipi, cpu))
> > > +                     return true;
> > > +     }
> > > +     return false;
> > > +}
> > > +
> >
> > The lack of memory barriers makes me wonder how reliable this is.
> > Maybe this is relying on the IPIs themselves acting as such, but
> > that's extremely racy no matter how you look at it.
> 
> It's deliberately lightweight. I am worried about introducing
> locking/barriers, as those could be costly and introduce latencies in
> these paths.

"I've made this car 10% faster by removing the brakes. It's great! Try
it!"

> Still this is good enough to significantly improve cpuidle based
> decisions in this regard. Please have a look at the commit message of
> patch3.

If I can't see how this thing is *correct*, I really don't care how
fast it is. You might as well remove most locks and barriers from the
kernel -- it will be even faster!

	M.

-- 
Without deviation from the norm, progress is not possible.