[REGRESSION] Random oops on SG2042 with Linux 6.16-rc and dynamic ftrace

Yao Zi ziyao at disroot.org
Mon Jul 7 22:15:15 PDT 2025


On Wed, Jul 02, 2025 at 03:05:06PM +0200, Alexandre Ghiti wrote:
> Hi Yao,
> 
> On 7/2/25 12:50, Yao Zi wrote:
> > On Tue, Jul 01, 2025 at 02:27:32PM +0200, Alexandre Ghiti wrote:
> > > Hi Yao,
> > > 
> > > On 7/1/25 08:41, Yao Zi wrote:
> > > > Linux v6.16 built with dynamic ftrace randomly oops or triggers
> > > > ftrace_bug() on Sophgo SG2042 when booting systemd-based userspace,
> > ...
> > 
> > > > Not sure either reverting the commits or fixing them up is a better
> > > > idea, but anyway the fatal first issue shouidn't go into the stable
> > > > release.
> > > Let's fix this, we were expecting issues with dynamic ftrace :)
> > > 
> > > So the following diff fixes all the issues you mentioned (not the first
> > > crash though, I'll let you test and see if it works better, I don't have
> > > this board):
> > Thanks for the fix! I've tested it with both QEMU and SG2042, it does
> > fix the lockdep failures as well as the boot time crash on SG2042. The
> > boot-time crash is caused by the race so will disappear as long as we
> > fix the race.
> > 
> > > diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > > index 4c6c24380cfd9..97ced537aa1e0 100644
> > > --- a/arch/riscv/kernel/ftrace.c
> > > +++ b/arch/riscv/kernel/ftrace.c
> > > @@ -14,6 +14,16 @@
> > >   #include <asm/text-patching.h>
> > > 
> > >   #ifdef CONFIG_DYNAMIC_FTRACE
> > > +void ftrace_arch_code_modify_prepare(void)
> > > +{
> > > +       mutex_lock(&text_mutex);
> > > +}
> > > +
> > > +void ftrace_arch_code_modify_post_process(void)
> > > +{
> > > +       mutex_unlock(&text_mutex);
> > > +}
> > > +
> > >   unsigned long ftrace_call_adjust(unsigned long addr)
> > >   {
> > >          if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
> > > @@ -29,10 +39,8 @@ unsigned long arch_ftrace_get_symaddr(unsigned long
> > > fentry_ip)
> > > 
> > >   void arch_ftrace_update_code(int command)
> > >   {
> > > -       mutex_lock(&text_mutex);
> > >          command |= FTRACE_MAY_SLEEP;
> > >          ftrace_modify_all_code(command);
> > > -       mutex_unlock(&text_mutex);
> > >          flush_icache_all();
> > >   }
> > > 
> > > @@ -149,16 +157,17 @@ int ftrace_init_nop(struct module *mod, struct
> > > dyn_ftrace *rec)
> > >          unsigned int nops[2], offset;
> > >          int ret;
> > > 
> > > +       mutex_lock(&text_mutex);
> > Besides using the guard API, could we swap the order between
> > ftrace_rec_set_nop_ops() and calculation of the nops array? This shrinks
> > the critical region a little.
> 
> 
> If you don't mind, I won't, I don't like initializing stuff which could
> never be used in case of error.

Yes, I don't mind it.

> 
> > 
> > With or without the change, here's my tag,
> > 
> > Tested-by: Yao Zi <ziyao at disroot.org>
> > 
> > and also
> > 
> > Reported-by: Han Gao <rabenda.cn at gmail.com>
> > Reported-by: Vivian Wang <wangruikang at iscas.ac.cn>
> > 
> > for their first-hand report of boot-time crash and analysis for the
> > first lock issue.
> 
> 
> I'll add all those tags in the patch I'll send today (or tomorrow if the CI
> is slow).

Is there any update about the fix patch? It'll be nice to get the
problem fixed soon.

> Thanks again for the great bug report, really appreciated.
> 
> Alex

Best regards,
Yao Zi



More information about the linux-riscv mailing list