[PATCH] workqueue: Fix race in schedule and flush work

Mon Feb 14 11:43:52 PST 2022

Hello,

> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 33f1106b4f99..a3f53f859e9d 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3326,28 +3326,38 @@ EXPORT_SYMBOL(cancel_delayed_work_sync);
>   */
>  int schedule_on_each_cpu(work_func_t func)
>  {
> -	int cpu;
>  	struct work_struct __percpu *works;
> +	cpumask_var_t sched_cpumask;
> +	int cpu, ret = 0;
>  
> -	works = alloc_percpu(struct work_struct);
> -	if (!works)
> +	if (!alloc_cpumask_var(&sched_cpumask, GFP_KERNEL))
>  		return -ENOMEM;
>  
> +	works = alloc_percpu(struct work_struct);
> +	if (!works) {
> +		ret = -ENOMEM;
> +		goto free_cpumask;
> +	}
> +
>  	cpus_read_lock();
>  
> -	for_each_online_cpu(cpu) {
> +	cpumask_copy(sched_cpumask, cpu_online_mask);
> +	for_each_cpu_and(cpu, sched_cpumask, cpu_online_mask) {

This definitely would need a comment explaining what's going on cuz it looks
weird to be copying the cpumask which is supposed to stay stable due to the
cpus_read_lock(). Given that it can only happen during early boot and the
online cpus can only be expanding, maybe just add sth like:

        if (early_during_boot) {
                for_each_possible_cpu(cpu)
                        INIT_WORK(per_cpu_ptr(works, cpu), func);
        }

BTW, who's calling schedule_on_each_cpu() that early during boot. It makes
no sense to do this while the cpumasks can't be stabilized.

Thanks.

-- 
tejun