[PATCH -next V7 0/7] riscv: Optimize function trace

Mon Feb 6 01:56:26 PST 2023

On Sat, Feb 04, 2023 at 02:40:52PM +0800, Guo Ren wrote:
> On Mon, Jan 16, 2023 at 11:02 PM Evgenii Shatokhin
> <e.shatokhin at yadro.com> wrote:
> >
> > Hi,
> >
> > On 12.01.2023 12:05, guoren at kernel.org wrote:
> > > From: Guo Ren <guoren at linux.alibaba.com>
> > >
> > > The previous ftrace detour implementation fc76b8b8011 ("riscv: Using
> > > PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems.
> > >
> > >   - The most horrible bug is preemption panic which found by Andy [1].
> > >     Let's disable preemption for ftrace first, and Andy could continue
> > >     the ftrace preemption work.
> >
> > It seems, the patches #2-#7 of this series do not require "riscv:
> > ftrace: Fixup panic by disabling preemption" and can be used without it.
> >
> > How about moving that patch out of the series and processing it separately?
> Okay.
> 
> >
> > As it was pointed out in the discussion of that patch, some other
> > solution to non-atomic changes of the prologue might be needed anyway.
> I think you mean Mark Rutland's DYNAMIC_FTRACE_WITH_CALL_OPS. But that
> still needs to be ready. Let's disable PREEMPT for ftrace first.

FWIW, taking the patch to disable FTRACE with PREEMPT for now makes sense to
me, too.

The DYNAMIC_FTRACE_WITH_CALL_OPS patches should be in v6.3. They're currently
queued in the arm64 tree in the for-next/ftrace branch:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/ftrace
  https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/ 

... and those *should* be in v6.3.

Patches to imeplement DIRECT_CALLS atop that are in review at the moment:

  https://lore.kernel.org/linux-arm-kernel/20230201163420.1579014-1-revest@chromium.org/

... and if riscv uses the CALL_OPS approach, I believe it can do much the same
there.

If riscv wants to do a single atomic patch to each patch-site (to avoid
stop_machine()), then direct calls would always needs to bounce through the
ftrace_caller trampoline (and acquire the direct call from the ftrace_ops), but
that might not be as bad as it sounds -- from benchmarking on arm64, the bulk
of the overhead seen with direct calls is when using the list_ops or having to
do a hash lookup, and both of those are avoided with the CALL_OPS approach.
Calling directly from the patch-site is a minor optimization relative to
skipping that work.

Thanks,
Mark.