[PATCH v6 23/29] context-tracking: Introduce work deferral infrastructure

Wed Oct 29 03:09:50 PDT 2025

On 28/10/25 15:00, Frederic Weisbecker wrote:
> Le Fri, Oct 10, 2025 at 05:38:33PM +0200, Valentin Schneider a écrit :
>> +	old = atomic_read(&ct->state);
>> +
>> +	/*
>> +	 * The work bit must only be set if the target CPU is not executing
>> +	 * in kernelspace.
>> +	 * CT_RCU_WATCHING is used as a proxy for that - if the bit is set, we
>> +	 * know for sure the CPU is executing in the kernel whether that be in
>> +	 * NMI, IRQ or process context.
>> +	 * Set CT_RCU_WATCHING here and let the cmpxchg do the check for us;
>> +	 * the state could change between the atomic_read() and the cmpxchg().
>> +	 */
>> +	old |= CT_RCU_WATCHING;
>
> Most of the time, the task should be either idle or in userspace. I'm still not
> sure why you start with a bet that the CPU is in the kernel with RCU watching.
>

Right I think I got that the wrong way around when I switched to using
CT_RCU_WATCHING vs CT_STATE_KERNEL. That wants to be

  old &= ~CT_RCU_WATCHING;

i.e. bet the CPU is NOHZ-idle, if it's not the cmpxchg fails and we don't
store the work bit.

>> +	/*
>> +	 * Try setting the work until either
>> +	 * - the target CPU has entered kernelspace
>> +	 * - the work has been set
>> +	 */
>> +	do {
>> +		ret = atomic_try_cmpxchg(&ct->state, &old, old | (work << CT_WORK_START));
>> +	} while (!ret && !(old & CT_RCU_WATCHING));
>
> So this applies blindly to idle as well, right? It should work but note that
> idle entry code before RCU watches is also fragile.
>

Yeah I remember losing some hair trying to grok the idle entry situation;
we could keep this purely NOHZ_FULL and have the deferral condition be:

  (ct->state & CT_STATE_USER) && !(ct->state & CT_RCU_WATCHING)