[PATCH v2] rcu: Use an intermediate irq_work to start process_srcu()

Boqun Feng boqun at kernel.org
Thu Jun 18 08:30:10 PDT 2026


On Thu, Jun 18, 2026 at 08:11:36AM -0500, Emil Renner Berthing wrote:
> Quoting Boqun Feng (2026-03-20 23:29:16)
> > Since commit c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in terms
> > of SRCU-fast") we switched to SRCU in BPF. However as BPF instrument can
> > happen basically everywhere (including where a scheduler lock is held),
> > call_srcu() now needs to avoid acquiring scheduler lock because
> > otherwise it could cause deadlock [1]. Fix this by following what the
> > previous RCU Tasks Trace did: using an irq_work to delay the queuing of
> > the work to start process_srcu().
> >
> > [boqun: Apply Joel's feedback]
> > [boqun: Apply Andrea's test feedback]
> >
> > Reported-by: Andrea Righi <arighi at nvidia.com>
> > Closes: https://lore.kernel.org/all/abjzvz_tL_siV17s@gpd4/
> > Fixes: commit c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
> > Link: https://lore.kernel.org/rcu/3c4c5a29-24ea-492d-aeee-e0d9605b4183@nvidia.com/ [1]
> > Suggested-by: Zqiang <qiang.zhang at linux.dev>
> > Tested-by: Andrea Righi <arighi at nvidia.com>
> > Signed-off-by: Boqun Feng <boqun at kernel.org>
> 
> Thanks for the patch, which is now upstream as
> 7c405fb3279b ("rcu: Use an intermediate irq_work to start process_srcu()")
> 
> Unfortunately it seems to halt booting on the Allwinner D1 single-core RISC-V
> SoC both on the 7.0 and 7.1 kernels:
> 

By "single-core", is it only one CPU in that system?

> https://esmil.dk/d1-7.0.txt
> https://esmil.dk/d1-7.1.txt
> 

>From the callstack,

[  243.002818] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.010681] task:kworker/0:0     state:D stack:0     pid:9     tgid:9     ppid:2      task_flags:0x4208060 flags:0x00000000
[  243.021858] Workqueue: device_link_wq device_link_release_fn
[  243.027587] Call Trace:
[  243.030079] [<ffffffff80745b66>] __schedule+0x232/0x530
[  243.035365] [<ffffffff80745e82>] schedule+0x1e/0x94
[  243.040298] [<ffffffff8074aa7a>] schedule_timeout+0x6e/0xa8
[  243.045929] [<ffffffff8004d6a0>] wakeup_preempt+0x8c/0x98
[  243.051380] [<ffffffff80746aee>] wait_for_completion+0x3a/0xb0
[  243.057267] [<ffffffff80087226>] __synchronize_srcu.part.0+0x4e/0x60
[  243.063712] [<ffffffff80084414>] rcu_tasks_get_gp_data+0xc/0x10
[  243.069699] [<ffffffff80490898>] device_link_release_fn+0x14/0x80
[  243.075850] [<ffffffff8003dbd0>] process_one_work+0xf8/0x1c8
[  243.081583] [<ffffffff8003e282>] worker_thread+0x11a/0x254
[  243.087127] [<ffffffff8003e164>] rescuer_thread+0x3a0/0x3a4
[  243.092755] [<ffffffff8003e164>] rescuer_thread+0x3a0/0x3a4
[  243.098381] [<ffffffff8004499e>] kthread+0xbe/0xe4
[  243.103231] [<ffffffff8001349e>] ret_from_fork_kernel+0x6/0xd0
[  243.109127] [<ffffffff800517a8>] schedule_tail+0x8/0xac
[  243.114410] [<ffffffff800448dc>] kthreads_online_cpu+0x0/0x4
[  243.120121] [<ffffffff8074be0a>] ret_from_fork_kernel_asm+0x12/0x18

I will guess for some reasons the irq_work was missed or the IPI never
happened. Is it possible for you to enable trace points ipi_send_cpu and
ipi_entry to get more information?

> Reverting this patch seems to fix the issue although it then breaks when loading
> modules which is can be fixed by reverting:
> d6e152d905bd ("clockevents: Prevent timer interrupt starvation")
> 
> This might be a side-effect of reverting this patch though.
> 

I'm not so sure about this, i.e. you might have *two* problems :(

Regards,
Boqun

> We'd really like to continue support for this SoC on the 7.0 kernel, but since
> this patch is a bugfix just shippping the revert doesn't sound appealing.
> So some guidance would be much appreciated, thanks!
> 
> /Emil



More information about the linux-riscv mailing list