[PATCH] workqueue: Fix race in schedule and flush work

Wed Feb 16 11:07:00 PST 2022

On Wed, Feb 16, 2022 at 07:49:39PM +0100, Padmanabha Srinivasaiah wrote:
> On Mon, Feb 14, 2022 at 09:43:52AM -1000, Tejun Heo wrote:
> > Hello,
> > 
> > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > > index 33f1106b4f99..a3f53f859e9d 100644
> > > --- a/kernel/workqueue.c
> > > +++ b/kernel/workqueue.c
> > > @@ -3326,28 +3326,38 @@ EXPORT_SYMBOL(cancel_delayed_work_sync);
> > >   */
> > >  int schedule_on_each_cpu(work_func_t func)
> > >  {
> > > -	int cpu;
> > >  	struct work_struct __percpu *works;
> > > +	cpumask_var_t sched_cpumask;
> > > +	int cpu, ret = 0;
> > >  
> > > -	works = alloc_percpu(struct work_struct);
> > > -	if (!works)
> > > +	if (!alloc_cpumask_var(&sched_cpumask, GFP_KERNEL))
> > >  		return -ENOMEM;
> > >  
> > > +	works = alloc_percpu(struct work_struct);
> > > +	if (!works) {
> > > +		ret = -ENOMEM;
> > > +		goto free_cpumask;
> > > +	}
> > > +
> > >  	cpus_read_lock();
> > >  
> > > -	for_each_online_cpu(cpu) {
> > > +	cpumask_copy(sched_cpumask, cpu_online_mask);
> > > +	for_each_cpu_and(cpu, sched_cpumask, cpu_online_mask) {
> > 
> > This definitely would need a comment explaining what's going on cuz it looks
> > weird to be copying the cpumask which is supposed to stay stable due to the
> > cpus_read_lock().Given that it can only happen during early boot and the
> > online cpus can only be expanding, maybe just add sth like:
> > 
> >         if (early_during_boot) {
> >                 for_each_possible_cpu(cpu)
> >                         INIT_WORK(per_cpu_ptr(works, cpu), func);
> >         }
> > 
> 
> Thanks tejun for the reply and suggestions.
> 
> Yes, unfortunately cpus_read_lock not keeping cpumask stable at
> secondary boot. Not sure, may be it only gurantee 'cpu' dont go down
> under cpus_read_[lock/unlock].
> 
> As suggested will tryout something like:
> 	if (system_state != RUNNING) {
> 		:
> 	}
> > BTW, who's calling schedule_on_each_cpu() that early during boot. It makes
> > no sense to do this while the cpumasks can't be stabilized.
> >
> It is  implemenation of CONFIG_TASKS_RUDE_RCU.

Another option would be to adjust CONFIG_TASKS_RUDE_RCU based on where
things are in the boot process.  For example:

	// Wait for one rude RCU-tasks grace period.
	static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
	{
		if (num_online_cpus() <= 1)
			return;  // Fastpath for only one CPU.
		rtp->n_ipis += cpumask_weight(cpu_online_mask);
		schedule_on_each_cpu(rcu_tasks_be_rude);
	}

Easy enough either way!

							Thanx, Paul