[PATCH bpf-next v3 2/4] libbpf: Access first syscall argument with CO-RE direct read on arm64
Andrii Nakryiko
andrii.nakryiko at gmail.com
Wed Sep 4 13:06:51 PDT 2024
On Sat, Aug 31, 2024 at 12:57 AM Xu Kuohai <xukuohai at huaweicloud.com> wrote:
>
> On 8/31/2024 3:26 PM, Xu Kuohai wrote:
> > On 8/31/2024 12:19 PM, Pu Lehui wrote:
> >> From: Pu Lehui <pulehui at huawei.com>
> >>
> >> Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
> >> SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
> >> into account the read pt_regs comes directly from the context, let's use
> >> CO-RE direct read to access the first system call argument.
> >>
> >> Suggested-by: Andrii Nakryiko <andrii at kernel.org>
> >> Signed-off-by: Pu Lehui <pulehui at huawei.com>
> >> ---
> >> tools/lib/bpf/bpf_tracing.h | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h
> >> index e7d9382efeb3..051c408e6aed 100644
> >> --- a/tools/lib/bpf/bpf_tracing.h
> >> +++ b/tools/lib/bpf/bpf_tracing.h
> >> @@ -222,7 +222,7 @@ struct pt_regs___s390 {
> >> struct pt_regs___arm64 {
> >> unsigned long orig_x0;
> >> -};
> >> +} __attribute__((preserve_access_index));
> >> /* arm64 provides struct user_pt_regs instead of struct pt_regs to userspace */
> >> #define __PT_REGS_CAST(x) ((const struct user_pt_regs *)(x))
> >> @@ -241,7 +241,7 @@ struct pt_regs___arm64 {
> >> #define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG
> >> #define __PT_PARM5_SYSCALL_REG __PT_PARM5_REG
> >> #define __PT_PARM6_SYSCALL_REG __PT_PARM6_REG
> >> -#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
> >> +#define PT_REGS_PARM1_SYSCALL(x) (((const struct pt_regs___arm64 *)(x))->orig_x0)
> >> #define PT_REGS_PARM1_CORE_SYSCALL(x) \
> >> BPF_CORE_READ((const struct pt_regs___arm64 *)(x), __PT_PARM1_SYSCALL_REG)
> >
> > Cool!
> >
> > Acked-by: Xu Kuohai <xukuohai at huawei.com>
> >
> >
>
> Wait, it breaks the following test:
>
You mean, *if you change the existing test like below*, it will break,
right? And that's expected, because arm64 has
ARCH_HAS_SYSCALL_WRAPPER, which means syscall pt_regs are actually not
the kprobe's ctx, so you can't directly access it. Which is why we
have PT_REGS_PARM1_CORE_SYSCALL() variants.
See how BPF_KSYSCALL macro is implemented, there are two cases:
___bpf_syswap_args(), which uses BPF_CORE_READ()-based macros to fetch
arguments, and ___bpf_syscall_args() which uses direct ctx reads.
> --- a/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
> @@ -43,7 +43,7 @@ int BPF_KPROBE(handle_sys_prctl)
>
> /* test for PT_REGS_PARM */
>
> - bpf_probe_read_kernel(&tmp, sizeof(tmp), &PT_REGS_PARM1_SYSCALL(real_regs));
> + tmp = PT_REGS_PARM1_SYSCALL(real_regs);
> arg1 = tmp;
> bpf_probe_read_kernel(&arg2, sizeof(arg2), &PT_REGS_PARM2_SYSCALL(real_regs));
> bpf_probe_read_kernel(&arg3, sizeof(arg3), &PT_REGS_PARM3_SYSCALL(real_regs));
>
> Failed with verifier rejection:
>
> 0: R1=ctx() R10=fp0
> ; int BPF_KPROBE(handle_sys_prctl) @ bpf_syscall_macro.c:33
> 0: (bf) r6 = r1 ; R1=ctx() R6_w=ctx()
> ; pid_t pid = bpf_get_current_pid_tgid() >> 32; @ bpf_syscall_macro.c:36
> 1: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar()
> ; if (pid != filter_pid) @ bpf_syscall_macro.c:39
> 2: (18) r1 = 0xffff800082e0e000 ; R1_w=map_value(map=bpf_sysc.rodata,ks=4,vs=4)
> 4: (61) r1 = *(u32 *)(r1 +0) ; R1_w=607
> ; pid_t pid = bpf_get_current_pid_tgid() >> 32; @ bpf_syscall_macro.c:36
> 5: (77) r0 >>= 32 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> ; if (pid != filter_pid) @ bpf_syscall_macro.c:39
> 6: (5e) if w1 != w0 goto pc+98 ; R0_w=607 R1_w=607
> ; real_regs = PT_REGS_SYSCALL_REGS(ctx); @ bpf_syscall_macro.c:42
> 7: (79) r8 = *(u64 *)(r6 +0) ; R6_w=ctx() R8_w=scalar()
> ; tmp = PT_REGS_PARM1_SYSCALL(real_regs); @ bpf_syscall_macro.c:46
> 8: (79) r1 = *(u64 *)(r8 +272)
> R8 invalid mem access 'scalar'
> processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>
More information about the linux-riscv
mailing list