[PATCH v5] arm64: kprobe: Enable OPTPROBE for arm64

Song Bao Hua (Barry Song) song.bao.hua at hisilicon.com
Tue Dec 14 15:48:32 PST 2021



> -----Original Message-----
> From: Jianhua Liu [mailto:jianhua.ljh at gmail.com]
> Sent: Wednesday, December 15, 2021 4:56 AM
> To: Masami Hiramatsu <mhiramat at kernel.org>
> Cc: Will Deacon <will at kernel.org>; liuqi (BA) <liuqi115 at huawei.com>; Catalin
> Marinas <catalin.marinas at arm.com>; linux-arm-kernel at lists.infradead.org;
> Song Bao Hua (Barry Song) <song.bao.hua at hisilicon.com>; Zengtao (B)
> <prime.zeng at hisilicon.com>; robin.murphy at arm.com; Linuxarm
> <linuxarm at huawei.com>; linux-kernel at vger.kernel.org
> Subject: Re: [PATCH v5] arm64: kprobe: Enable OPTPROBE for arm64
> 
> On Tue, Dec 14, 2021 at 8:27 AM Masami Hiramatsu <mhiramat at kernel.org> wrote:
> >
> > On Mon, 13 Dec 2021 18:38:52 +0000
> > Will Deacon <will at kernel.org> wrote:
> >
> > > Hi,
> > >
> > > [+Janet Liu]
> > >
> > > On Tue, Dec 07, 2021 at 08:40:02PM +0800, Qi Liu wrote:
> > > > This patch introduce optprobe for ARM64. In optprobe, probed
> > > > instruction is replaced by a branch instruction to trampoline.
> > > >
> > > > Performance of optprobe on Hip08 platform is test using kprobe
> > > > example module[1] to analyze the latency of a kernel function,
> > > > and here is the result:
> > > >
> > > > common kprobe:
> > > > [280709.846380] do_empty returned 0 and took 1530 ns to execute
> > > > [280709.852057] do_empty returned 0 and took 550 ns to execute
> > > > [280709.857631] do_empty returned 0 and took 440 ns to execute
> > > > [280709.863215] do_empty returned 0 and took 380 ns to execute
> > > > [280709.868787] do_empty returned 0 and took 360 ns to execute
> > > > [280709.874362] do_empty returned 0 and took 340 ns to execute
> > > > [280709.879936] do_empty returned 0 and took 320 ns to execute
> > > > [280709.885505] do_empty returned 0 and took 300 ns to execute
> > > > [280709.891075] do_empty returned 0 and took 280 ns to execute
> > > > [280709.896646] do_empty returned 0 and took 290 ns to execute
> > > >
> > > > optprobe:
> > > > [ 2965.964572] do_empty returned 0 and took 90 ns to execute
> > > > [ 2965.969952] do_empty returned 0 and took 80 ns to execute
> > > > [ 2965.975332] do_empty returned 0 and took 70 ns to execute
> > > > [ 2965.980714] do_empty returned 0 and took 60 ns to execute
> > > > [ 2965.986128] do_empty returned 0 and took 80 ns to execute
> > > > [ 2965.991507] do_empty returned 0 and took 70 ns to execute
> > > > [ 2965.996884] do_empty returned 0 and took 70 ns to execute
> > > > [ 2966.002262] do_empty returned 0 and took 80 ns to execute
> > > > [ 2966.007642] do_empty returned 0 and took 70 ns to execute
> > > > [ 2966.013020] do_empty returned 0 and took 70 ns to execute
> > > > [ 2966.018400] do_empty returned 0 and took 70 ns to execute
> > > >
> > > > As the result shows, optprobe can greatly reduce the latency. Big
> > > > latency of common kprobe will significantly impact the real result
> > > > while doing performance analysis or debugging performance issues
> > > > in lab, so optprobe is useful in this scenario.
> > > >
> > > > Acked-by: Masami Hiramatsu <mhiramat at kernel.org>
> > > > Signed-off-by: Qi Liu <liuqi115 at huawei.com>
> > > >
> > > > Note:
> > > > As branch instruction in Arm64 has a 128M range limitation, optprobe
> > > > could only used when offset between probe point and trampoline
> > > > is less than 128M, otherwise kernel will choose common kprobe
> > > > automaticly.
> > > >
> > > > Limitation caused by branch isn't unique to Arm64, but also to
> > > > x86/arm/powerpc.
> > > >
> > > > In fact, Module PLT has been tried to get rid of limiation, but
> > > > destination of PLT must be a fixed value, and we need to modify
> > > > the destination (as each optprobe has its own trampoline).
> > > >
> > > > As discussed with Masami[2], we can start with core-kernel point
> > > > (within 128M) as the first step, like other architectures.
> > > >
> > > > [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/sa
> mples/kprobes/kretprobe_example.c
> > > > [2]
> https://lore.kernel.org/lkml/20211201105001.5164f98ba783e7207df1229c@kerne
> l.org/
> > > > ---
> > > >  arch/arm64/Kconfig                            |   1 +
> > > >  arch/arm64/include/asm/kprobes.h              |  21 ++
> > > >  arch/arm64/kernel/probes/Makefile             |   2 +
> > > >  arch/arm64/kernel/probes/opt_arm64.c          | 199 ++++++++++++++++++
> > > >  .../arm64/kernel/probes/optprobe_trampoline.S |  97 +++++++++
> > > >  include/linux/kprobes.h                       |   2 +
> > > >  kernel/kprobes.c                              |  22 ++
> > > >  7 files changed, 344 insertions(+)
> > > >  create mode 100644 arch/arm64/kernel/probes/opt_arm64.c
> > > >  create mode 100644 arch/arm64/kernel/probes/optprobe_trampoline.S
> > >
> > > I've not looked at these changes in detail, but it looks like there is an
> > > independent patch from Janet Liu trying to do the same thing:
> > >
> > >
> https://lore.kernel.org/r/1635858706-27320-1-git-send-email-jianhua.ljh@gm
> ail.com
> > >
> >
> > Thanks for noticing. I missed it.
> >
> > > The patch here from Qi Liu looks like it's a bit further along, but it
> > > would be good for Janet to at least test it out and confirm that it works
> > > for them.
> >
> > Yeah, it's now v5.
> > But it seems Janet's one also has good points. I would like Janet's sharing
> > save_all_base_regs macro and the comment about the branch instruction.
> >
> > >
> > > Cheers,
> > >
> > > Will
> > >
> > > [Kept diff inline for Janet]
> >
> > Janet, please feel free to review and test it. It is important that you confirm
> > this can work with your envionment too.
> > I will review your KPROBE_ON_FTRACE patch.
> >
> I have tested these patch on UNISOC s9863a platform before sending.
> 
> The test case from:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/sa
> mples/kprobes/
> 
> And I do the following change for kprobe_example.c before testing:
> 1. delete function handler_post,
>    kprobe_init does "kp.post_handler = handler_post; --->
> p.post_handler = NULL;"
> 2. handler_pre calls dump_stack.
> 
> Thanks for  the review.

Hello, Jianhua. I guess Will and Masami meant you may
test liuqi's optprobe patch on your hardware and make
sure it can work. At the same time, Masami will also
take care of your approach.

Thanks
Barry



More information about the linux-arm-kernel mailing list