[PATCH v2] arm64: insn: Simulate nop instruction for better uprobe performance

Andrii Nakryiko andrii.nakryiko at gmail.com
Thu Oct 10 15:22:26 PDT 2024


On Thu, Oct 10, 2024 at 3:58 AM Mark Rutland <mark.rutland at arm.com> wrote:
>
> Hi Andrii,
>
> On Wed, Oct 09, 2024 at 04:54:25PM -0700, Andrii Nakryiko wrote:
> > On Mon, Sep 9, 2024 at 12:21 AM Liao Chang <liaochang1 at huawei.com> wrote:
>
> > I'm curious what's the status of this patch? It received no comments
> > so far in the last month. Can someone on the ARM64 side of things
> > please take a look? (or maybe it was applied to some tree and there
> > was just no notification?)
> >
> > This is a very useful performance optimization for uprobe tracing on
> > ARM64, so would be nice to get it in during current release cycle.
> > Thank you!
>
> Sorry, I got busy chasing up a bunch of bugs and hadn't gotten round to
> this yet.
>
> I've replied with a couple of minor comments and an ack, and I reckon we
> can queue this up this cycle. Usually this sort of thing starts to get
> queued around -rc3.

Thanks Mark! I'm happy to backport it internally before it goes into
official kernel release, as long as it's clear that the patch is in
the final state. So once Liao posts a new version with your ack, I'll
just go ahead and use it internally.

When you get a chance, please also take another look at Liao's second
optimization targeting STP instruction. I know it was more
controversial, but hopefully we can arrive at some maintainable
solution that would still benefit a very common uprobe tracing use
case. Thanks in advance!

  [0] https://lore.kernel.org/linux-trace-kernel/20240910060407.1427716-1-liaochang1@huawei.com/

>
> Mark.
>
> >
> > > diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> > > index 8c0a36f72d6f..dd530d5c3d67 100644
> > > --- a/arch/arm64/include/asm/insn.h
> > > +++ b/arch/arm64/include/asm/insn.h
> > > @@ -549,6 +549,12 @@ static __always_inline bool aarch64_insn_uses_literal(u32 insn)
> > >                aarch64_insn_is_prfm_lit(insn);
> > >  }
> > >
> > > +static __always_inline bool aarch64_insn_is_nop(u32 insn)
> > > +{
> > > +       return aarch64_insn_is_hint(insn) &&
> > > +              ((insn & 0xFE0) == AARCH64_INSN_HINT_NOP);
> > > +}
> > > +
> > >  enum aarch64_insn_encoding_class aarch64_get_insn_class(u32 insn);
> > >  u64 aarch64_insn_decode_immediate(enum aarch64_insn_imm_type type, u32 insn);
> > >  u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
> > > diff --git a/arch/arm64/kernel/probes/decode-insn.c b/arch/arm64/kernel/probes/decode-insn.c
> > > index 968d5fffe233..be54539e309e 100644
> > > --- a/arch/arm64/kernel/probes/decode-insn.c
> > > +++ b/arch/arm64/kernel/probes/decode-insn.c
> > > @@ -75,6 +75,15 @@ static bool __kprobes aarch64_insn_is_steppable(u32 insn)
> > >  enum probe_insn __kprobes
> > >  arm_probe_decode_insn(probe_opcode_t insn, struct arch_probe_insn *api)
> > >  {
> > > +       /*
> > > +        * While 'nop' instruction can execute in the out-of-line slot,
> > > +        * simulating them in breakpoint handling offers better performance.
> > > +        */
> > > +       if (aarch64_insn_is_nop(insn)) {
> > > +               api->handler = simulate_nop;
> > > +               return INSN_GOOD_NO_SLOT;
> > > +       }
> > > +
> > >         /*
> > >          * Instructions reading or modifying the PC won't work from the XOL
> > >          * slot.
> > > diff --git a/arch/arm64/kernel/probes/simulate-insn.c b/arch/arm64/kernel/probes/simulate-insn.c
> > > index 22d0b3252476..5e4f887a074c 100644
> > > --- a/arch/arm64/kernel/probes/simulate-insn.c
> > > +++ b/arch/arm64/kernel/probes/simulate-insn.c
> > > @@ -200,3 +200,14 @@ simulate_ldrsw_literal(u32 opcode, long addr, struct pt_regs *regs)
> > >
> > >         instruction_pointer_set(regs, instruction_pointer(regs) + 4);
> > >  }
> > > +
> > > +void __kprobes
> > > +simulate_nop(u32 opcode, long addr, struct pt_regs *regs)
> > > +{
> > > +       /*
> > > +        * Compared to instruction_pointer_set(), it offers better
> > > +        * compatibility with single-stepping and execution in target
> > > +        * guarded memory.
> > > +        */
> > > +       arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
> > > +}
> > > diff --git a/arch/arm64/kernel/probes/simulate-insn.h b/arch/arm64/kernel/probes/simulate-insn.h
> > > index e065dc92218e..efb2803ec943 100644
> > > --- a/arch/arm64/kernel/probes/simulate-insn.h
> > > +++ b/arch/arm64/kernel/probes/simulate-insn.h
> > > @@ -16,5 +16,6 @@ void simulate_cbz_cbnz(u32 opcode, long addr, struct pt_regs *regs);
> > >  void simulate_tbz_tbnz(u32 opcode, long addr, struct pt_regs *regs);
> > >  void simulate_ldr_literal(u32 opcode, long addr, struct pt_regs *regs);
> > >  void simulate_ldrsw_literal(u32 opcode, long addr, struct pt_regs *regs);
> > > +void simulate_nop(u32 opcode, long addr, struct pt_regs *regs);
> > >
> > >  #endif /* _ARM_KERNEL_KPROBES_SIMULATE_INSN_H */
> > > --
> > > 2.34.1
> > >



More information about the linux-arm-kernel mailing list