[PATCH -next V7 0/7] riscv: Optimize function trace

Mon Feb 6 19:57:06 PST 2023

On Mon, Feb 6, 2023 at 5:56 PM Mark Rutland <mark.rutland at arm.com> wrote:
>
> On Sat, Feb 04, 2023 at 02:40:52PM +0800, Guo Ren wrote:
> > On Mon, Jan 16, 2023 at 11:02 PM Evgenii Shatokhin
> > <e.shatokhin at yadro.com> wrote:
> > >
> > > Hi,
> > >
> > > On 12.01.2023 12:05, guoren at kernel.org wrote:
> > > > From: Guo Ren <guoren at linux.alibaba.com>
> > > >
> > > > The previous ftrace detour implementation fc76b8b8011 ("riscv: Using
> > > > PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems.
> > > >
> > > >   - The most horrible bug is preemption panic which found by Andy [1].
> > > >     Let's disable preemption for ftrace first, and Andy could continue
> > > >     the ftrace preemption work.
> > >
> > > It seems, the patches #2-#7 of this series do not require "riscv:
> > > ftrace: Fixup panic by disabling preemption" and can be used without it.
> > >
> > > How about moving that patch out of the series and processing it separately?
> > Okay.
> >
> > >
> > > As it was pointed out in the discussion of that patch, some other
> > > solution to non-atomic changes of the prologue might be needed anyway.
> > I think you mean Mark Rutland's DYNAMIC_FTRACE_WITH_CALL_OPS. But that
> > still needs to be ready. Let's disable PREEMPT for ftrace first.
>
> FWIW, taking the patch to disable FTRACE with PREEMPT for now makes sense to
> me, too.
Thx, you agree with that.

>
> The DYNAMIC_FTRACE_WITH_CALL_OPS patches should be in v6.3. They're currently
> queued in the arm64 tree in the for-next/ftrace branch:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/ftrace
>   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/
>
> ... and those *should* be in v6.3.
Glade to hear that. Great!

>
> Patches to imeplement DIRECT_CALLS atop that are in review at the moment:
>
>   https://lore.kernel.org/linux-arm-kernel/20230201163420.1579014-1-revest@chromium.org/
Good reference. Thx for sharing.

>
> ... and if riscv uses the CALL_OPS approach, I believe it can do much the same
> there.
>
> If riscv wants to do a single atomic patch to each patch-site (to avoid
> stop_machine()), then direct calls would always needs to bounce through the
> ftrace_caller trampoline (and acquire the direct call from the ftrace_ops), but
> that might not be as bad as it sounds -- from benchmarking on arm64, the bulk
> of the overhead seen with direct calls is when using the list_ops or having to
> do a hash lookup, and both of those are avoided with the CALL_OPS approach.
> Calling directly from the patch-site is a minor optimization relative to
> skipping that work.
Yes, CALL_OPS could solve the PREEMPTION & stop_machine problems. I
would follow up.

The difference from arm64 is that RISC-V is 16bit/32bit mixed
instruction ISA, so we must keep ftrace_caller & ftrace_regs_caller in
2048 aligned. Then:
FTRACE_UPDATE_MAKE_CALL:
  * addr+00:          NOP // Literal (first 32 bits)
  * addr+04:          NOP // Literal (last 32 bits)
  * addr+08: func: auipc t0, ? // All trampolines are in the 2048
aligned place, so this point won't be changed.
  * addr+12:          jalr ?(t0) // For different trampolines:
ftrace_regs_caller, ftrace_caller

FTRACE_UPDATE_MAKE_NOP:
  * addr+00:          NOP // Literal (first 32 bits)
  * addr+04:          NOP // Literal (last 32 bits)
  * addr+08: func: c.j     // jump to addr + 16 and skip broken insn & jalr
  * addr+10:          xxx   // last half & broken insn of auipc t0, ?
  * addr+12:          jalr ?(t0) // To be patched to jalr ?<t0> ()
  * addr+16: func body

Right? (The call site would be increased from 64bit to 128bit ahead of func.)

>
> Thanks,
> Mark.

--
Best Regards
 Guo Ren