[PATCH v6 23/29] context-tracking: Introduce work deferral infrastructure
Frederic Weisbecker
frederic at kernel.org
Wed Oct 29 07:52:58 PDT 2025
Le Wed, Oct 29, 2025 at 11:09:50AM +0100, Valentin Schneider a écrit :
> On 28/10/25 15:00, Frederic Weisbecker wrote:
> > Le Fri, Oct 10, 2025 at 05:38:33PM +0200, Valentin Schneider a écrit :
> >> + old = atomic_read(&ct->state);
> >> +
> >> + /*
> >> + * The work bit must only be set if the target CPU is not executing
> >> + * in kernelspace.
> >> + * CT_RCU_WATCHING is used as a proxy for that - if the bit is set, we
> >> + * know for sure the CPU is executing in the kernel whether that be in
> >> + * NMI, IRQ or process context.
> >> + * Set CT_RCU_WATCHING here and let the cmpxchg do the check for us;
> >> + * the state could change between the atomic_read() and the cmpxchg().
> >> + */
> >> + old |= CT_RCU_WATCHING;
> >
> > Most of the time, the task should be either idle or in userspace. I'm still not
> > sure why you start with a bet that the CPU is in the kernel with RCU watching.
> >
>
> Right I think I got that the wrong way around when I switched to using
> CT_RCU_WATCHING vs CT_STATE_KERNEL. That wants to be
>
> old &= ~CT_RCU_WATCHING;
>
> i.e. bet the CPU is NOHZ-idle, if it's not the cmpxchg fails and we don't
> store the work bit.
Right.
>
> >> + /*
> >> + * Try setting the work until either
> >> + * - the target CPU has entered kernelspace
> >> + * - the work has been set
> >> + */
> >> + do {
> >> + ret = atomic_try_cmpxchg(&ct->state, &old, old | (work << CT_WORK_START));
> >> + } while (!ret && !(old & CT_RCU_WATCHING));
> >
> > So this applies blindly to idle as well, right? It should work but note that
> > idle entry code before RCU watches is also fragile.
> >
>
> Yeah I remember losing some hair trying to grok the idle entry situation;
> we could keep this purely NOHZ_FULL and have the deferral condition be:
>
> (ct->state & CT_STATE_USER) && !(ct->state & CT_RCU_WATCHING)
Well, after all what works for NOHZ_FULL should also work for idle. It's
preceded by entry code as well (or rather __cpuidle).
Thanks.
--
Frederic Weisbecker
SUSE Labs
More information about the linux-riscv
mailing list