[PATCH] riscv: Fix early ftrace nop patching

Tue Jun 18 20:40:10 PDT 2024

On Tue, Jun 18, 2024 at 9:40 PM Alexandre Ghiti <alexghiti at rivosinc.com> wrote:
>
> Hi Andy,
>
> On Tue, Jun 18, 2024 at 2:48 PM Andy Chiu <andy.chiu at sifive.com> wrote:
> >
> > On Tue, Jun 18, 2024 at 8:02 PM Alexandre Ghiti <alex at ghiti.fr> wrote:
> > >
> > > Hi Conor,
> > >
> > > On 17/06/2024 15:23, Alexandre Ghiti wrote:
> > > > Hi Conor,
> > > >
> > > > Sorry for the delay here.
> > > >
> > > > On 13/06/2024 09:48, Conor Dooley wrote:
> > > >> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
> > > >>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
> > > >>> converted ftrace_make_nop() to use patch_insn_write() which does not
> > > >>> emit any icache flush relying entirely on __ftrace_modify_code() to do
> > > >>> that.
> > > >>>
> > > >>> But we missed that ftrace_make_nop() was called very early directly
> > > >>> when
> > > >>> converting mcount calls into nops (actually on riscv it converts 2B
> > > >>> nops
> > > >>> emitted by the compiler into 4B nops).
> > > >>>
> > > >>> This caused crashes on multiple HW as reported by Conor and Björn since
> > > >>> the booting core could have half-patched instructions in its icache
> > > >>> which would trigger an illegal instruction trap: fix this by emitting a
> > > >>> local flush icache when early patching nops.
> > > >>>
> > > >>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
> > > >>> Signed-off-by: Alexandre Ghiti <alexghiti at rivosinc.com>
> > > >>> ---
> > > >>>   arch/riscv/include/asm/cacheflush.h | 6 ++++++
> > > >>>   arch/riscv/kernel/ftrace.c          | 3 +++
> > > >>>   2 files changed, 9 insertions(+)
> > > >>>
> > > >>> diff --git a/arch/riscv/include/asm/cacheflush.h
> > > >>> b/arch/riscv/include/asm/cacheflush.h
> > > >>> index dd8d07146116..ce79c558a4c8 100644
> > > >>> --- a/arch/riscv/include/asm/cacheflush.h
> > > >>> +++ b/arch/riscv/include/asm/cacheflush.h
> > > >>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
> > > >>>       asm volatile ("fence.i" ::: "memory");
> > > >>>   }
> > > >>>   +static inline void local_flush_icache_range(unsigned long start,
> > > >>> +                        unsigned long end)
> > > >>> +{
> > > >>> +    local_flush_icache_all();
> > > >>> +}
> > > >>> +
> > > >>>   #define PG_dcache_clean PG_arch_1
> > > >>>     static inline void flush_dcache_folio(struct folio *folio)
> > > >>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > > >>> index 4f4987a6d83d..32e7c401dfb4 100644
> > > >>> --- a/arch/riscv/kernel/ftrace.c
> > > >>> +++ b/arch/riscv/kernel/ftrace.c
> > > >>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct
> > > >>> dyn_ftrace *rec)
> > > >>>       out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> > > >>>       mutex_unlock(&text_mutex);
> > > >> So, turns out that this patch is not sufficient. I've seen some more
> > > >> crashes, seemingly due to initialising nops on this mutex_unlock().
> > > >> Palmer suggested moving the if (!mod) ... into the lock, which fixed
> > > >> the problem for me.
> > > >
> > > >
> > > > Ok, it makes sense, I completely missed that. I'll send a fix for that
> > > > shortly so that it can be merged in rc5.
> > > >
> > > > Thanks,
> > > >
> > > > Alex
> > >
> > >
> > > So I digged a bit more and I'm afraid that the only way to make sure
> > > this issue does not happen elsewhere is to flush the icache right after
> > > the patching. We actually can't wait to batch the icache flush since
> > > along the way, we may call a function that has just been patched (the
> > > issue that you encountered here).
> >
> > Trying to provide my thoughts, please let me know if I missed
> > anything. I think what Conor suggested is safe for init_nop, as it
> > would be called only when there is only one core (booting) and at the
> > loading stage of kernel modules. In the first case we just have to
> > make sure there is no patchable entry before the core executes
> > fence.i. The second case is unconditionally safe because there is no
> > read-side of the race.
> >
> > It is a bit tricky for the generic (runtime) case of ftrace code
> > patching, but that is not because of the batch fence.i maintenance. As
> > long as there exists a patchable entry for the stopping thread, it is
> > possible for them to execute on a partially patched instruction. A
> > solution for this is again to prevent any patchable entry in the
> > stop_machine's stopping thread. Another solution is to apply the
> > atomic ftrace patching series which aims to get rid of the race.
>
> Yeah but my worry is that we can't make sure that functions called
> between the patching and the fence.i are not the ones that just get
> patched. At least today, patch_insn_write() etc should be marked as

IIUC, the compiler does not generate a patchable entry for
patch_insn_write, and so do all functions in patch.c. The entire file
is not compiled with patchable entry because the flag is removed in
riscv's Makefile. Please let me know if I misunderstand it.

> noinstr. But even then, we call core functions that could be patchable
> (I went through all and it *seems* we are safe *for now*, but this is
> very fragile).

Yes, functions in the "else" clause, atomic_read() etc, are safe for
now. However, it is impossible to fix as long as there exists a
patchable entry along the way. This is the problem that we cannot
patch 2 instructions with a concurrent read-side. On the other hand,
the path of ftrace_modify_all_code may experience the batch fence.i
maintenance issue, and can be fixed via a local fence.i. This is
because the path is executed by only one core. There does not exist a
concurrent write-side. As a result, we just need to make sure it
leaves each patch-site with a local fence.i to make sure code is
up-to-date.

>
> Then what do you think we should do to fix Conor's issue right now:
> should we mark the riscv specific functions as noinstr, cross fingers
> that the core functions are not (and don't become) patchable and wait
> for your atomic patching? Or should we flush as soon as possible as I
> proposed above (which to me fixes the issue but hurts performance)?

Either way works for me. IMHO, the fix should include:

1. move fence.i before mutex_unlock in init_nop
2. do local fence.i in __ftrace_modify_call
3. make sure the else clause does not happen to have a patchable entry

Thanks,
Andy

>
> Thanks,
>
> Alex
>
> >
> > >
> > > I don't know how much it will impact the performance but I guess it will.
> > >
> > > Unless someone has a better idea (I added Andy and Puranjay in cc), here
> > > is the patch that implements this, can you give it a try? Thanks
> > >
> > > diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > > index 87cbd86576b2..4b95c574fd04 100644
> > > --- a/arch/riscv/kernel/ftrace.c
> > > +++ b/arch/riscv/kernel/ftrace.c
> > > @@ -120,9 +120,6 @@ int ftrace_init_nop(struct module *mod, struct
> > > dyn_ftrace *rec)
> > >          out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> > >          mutex_unlock(&text_mutex);
> > >
> > > -       if (!mod)
> > > -               local_flush_icache_range(rec->ip, rec->ip +
> > > MCOUNT_INSN_SIZE);
> > > -
> > >          return out;
> > >   }
> > >
> > > @@ -156,9 +153,9 @@ static int __ftrace_modify_code(void *data)
> > >          } else {
> > >                  while (atomic_read(&param->cpu_count) <= num_online_cpus())
> > >                          cpu_relax();
> > > -       }
> > >
> > > -       local_flush_icache_all();
> > > +               local_flush_icache_all();
> > > +       }
> > >
> > >          return 0;
> > >   }
> > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> > > index 4007563fb607..ab03732d06c4 100644
> > > --- a/arch/riscv/kernel/patch.c
> > > +++ b/arch/riscv/kernel/patch.c
> > > @@ -89,6 +89,14 @@ static int __patch_insn_set(void *addr, u8 c, size_t len)
> > >
> > >          memset(waddr, c, len);
> > >
> > > +       /*
> > > +        * We could have just patched a function that is about to be
> > > +        * called so make sure we don't execute partially patched
> > > +        * instructions by flushing the icache as soon as possible.
> > > +        */
> > > +       local_flush_icache_range((unsigned long)waddr,
> > > +                                (unsigned long)waddr + len);
> > > +
> > >          patch_unmap(FIX_TEXT_POKE0);
> > >
> > >          if (across_pages)
> > > @@ -135,6 +143,14 @@ static int __patch_insn_write(void *addr, const
> > > void *insn, size_t len)
> > >
> > >          ret = copy_to_kernel_nofault(waddr, insn, len);
> > >
> > > +       /*
> > > +        * We could have just patched a function that is about to be
> > > +        * called so make sure we don't execute partially patched
> > > +        * instructions by flushing the icache as soon as possible.
> > > +        */
> > > +       local_flush_icache_range((unsigned long)waddr,
> > > +                                (unsigned long)waddr + len);
> > > +
> > >          patch_unmap(FIX_TEXT_POKE0);
> > >
> > >          if (across_pages)
> > > @@ -189,9 +205,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len)
> > >
> > >          ret = patch_insn_set(tp, c, len);
> > >
> > > -       if (!ret)
> > > -               flush_icache_range((uintptr_t)tp, (uintptr_t)tp + len);
> > > -
> > >          return ret;
> > >   }
> > >   NOKPROBE_SYMBOL(patch_text_set_nosync);
> > > @@ -224,9 +237,6 @@ int patch_text_nosync(void *addr, const void *insns,
> > > size_t len)
> > >
> > >          ret = patch_insn_write(tp, insns, len);
> > >
> > > -       if (!ret)
> > > -               flush_icache_range((uintptr_t) tp, (uintptr_t) tp + len);
> > > -
> > >          return ret;
> > >   }
> > >   NOKPROBE_SYMBOL(patch_text_nosync);
> > > @@ -253,9 +263,9 @@ static int patch_text_cb(void *data)
> > >          } else {
> > >                  while (atomic_read(&patch->cpu_count) <= num_online_cpus())
> > >                          cpu_relax();
> > > -       }
> > >
> > > -       local_flush_icache_all();
> > > +               local_flush_icache_all();
> > > +       }
> > >
> > >          return ret;
> > >   }
> > >
> > >
> > > >
> > > >
> > > >>
> > > >>>   +    if (!mod)
> > > >>> +        local_flush_icache_range(rec->ip, rec->ip + MCOUNT_INSN_SIZE);
> > > >>> +
> > > >>>       return out;
> > > >>>   }
> > > >>>   --
> > > >>> 2.39.2
> > > >>>
> > > >>>
> > > >>> _______________________________________________
> > > >>> linux-riscv mailing list
> > > >>> linux-riscv at lists.infradead.org
> > > >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> > > >
> > > > _______________________________________________
> > > > linux-riscv mailing list
> > > > linux-riscv at lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> > Cheers,
> > Andy