[PATCH bpf-next v2 2/3] bpf, arm64: Add JIT support for stack arguments
Puranjay Mohan
puranjay12 at gmail.com
Thu May 28 07:23:32 PDT 2026
Hi Will,
Thanks for your review.
These patches have already landed in bpf-next tree as a part of this
series: https://lore.kernel.org/all/20260513044949.2382019-1-yonghong.song@linux.dev/
I will take all your feedback and send fixes for all the required changes.
> How does that work with kfuncs? I think the PCS means that they will
> expect to pick the stack arguments starting at [SP+0]. Or are you
> saying that SP == FP+16 on entry to the callee? It's hard to reconcile
> that with the ASCII art in build_prologue() because neither "current
> A64_FP" nor "BPF_FP" point below A64_SP and it's not clear which of them
> you're referring to when you refer to "FP" on its own.
At entry to kfunc the arguments are starting at [SP+0] like a normal C
callee function would expect.
The line "callee reads them from [FP+16], [FP+24]" is for a BPF callee
which always emits a frame record.
So the caller puts the arguments at SP+0, SP+8, .. and then the callee
creates the frame record in the prologue which means the first
argument is at FP+16, ....
This is just an implementation detail about how the JITed BPF callee
programs can read stack arguments from FP+16, ... easily.
>
> > BPF convention uses fixed offsets from BPF_REG_PARAMS (r11): off=-8 is
> > always arg 6, off=-16 arg 7, etc. The verifier invalidates all outgoing
> > stack arg slots after each call, so the compiler must re-store before
> > every call. This means x5-x7 don't need to be saved on stack.
> >
> > Signed-off-by: Yonghong Song <yonghong.song at linux.dev>
> > Signed-off-by: Puranjay Mohan <puranjay at kernel.org>
> > ---
> > arch/arm64/net/bpf_jit_comp.c | 87 ++++++++++++++++++++++++++++++++++-
> > 1 file changed, 86 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> > index 085e650662e3..cd8279880795 100644
> > --- a/arch/arm64/net/bpf_jit_comp.c
> > +++ b/arch/arm64/net/bpf_jit_comp.c
> > @@ -86,6 +86,7 @@ struct jit_ctx {
> > __le32 *image;
> > __le32 *ro_image;
> > u32 stack_size;
> > + u16 stack_arg_size;
> > u64 user_vm_start;
> > u64 arena_vm_start;
> > bool fp_used;
> > @@ -533,13 +534,19 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
> > * | |
> > * +-----+ <= (BPF_FP - prog->aux->stack_depth)
> > * |RSVD | padding
> > - * current A64_SP => +-----+ <= (BPF_FP - ctx->stack_size)
> > + * +-----+ <= (BPF_FP - ctx->stack_size)
> > + * | |
> > + * | ... | outgoing stack args (9+, if any)
> > + * | |
> > + * current A64_SP => +-----+
> > * | |
> > * | ... | Function call stack
> > * | |
> > * +-----+
> > * low
> > *
> > + * Stack args 6-8 are passed in x5-x7, args 9+ at [SP].
> > + * Incoming args 9+ are at [FP + 16], [FP + 24], ...
> > */
>
> I assume the arguments being passed are all <= 64-bit scalar types? If
> we ever want to pass anything with > 64-bit alignment or to a varargs
> function, then the rules for allocation get a little hairy.
Yes, right now only 64-bit scalars are supported.
>
> > emit_kcfi(is_main_prog ? cfi_bpf_hash : cfi_bpf_subprog_hash, ctx);
> > @@ -613,6 +620,9 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
> > if (ctx->stack_size && !ctx->priv_sp_used)
> > emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
> >
> > + if (ctx->stack_arg_size)
> > + emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_arg_size), ctx);
>
> How do you ensure that the stack pointer is always 16-byte aligned? We
> run with SP alignment checking enabled, so you need to take care of that.
Down below as you have already found, stack_arg_size is always aligned.
>
> > @@ -1191,6 +1207,41 @@ static int add_exception_handler(const struct bpf_insn *insn,
> > return 0;
> > }
> >
> > +static const u8 stack_arg_reg[] = { A64_R(5), A64_R(6), A64_R(7) };
> > +
> > +#define NR_STACK_ARG_REGS ARRAY_SIZE(stack_arg_reg)
> > +
> > +static void emit_stack_arg_load(u8 dst, s16 bpf_off, struct jit_ctx *ctx)
> > +{
> > + int idx = bpf_off / sizeof(u64) - 1;
> > +
> > + if (idx < NR_STACK_ARG_REGS)
> > + emit(A64_MOV(1, dst, stack_arg_reg[idx]), ctx);
> > + else
> > + emit(A64_LDR64I(dst, A64_FP, (idx - NR_STACK_ARG_REGS) * sizeof(u64) + 16), ctx);
> > +}
>
> Is it worth asserting that bpf_off >= 8 here or can we rely on that? I
> struggled to find any details about how bpf passes arguments on the
> stack (beyond what you describe in the commit message) and grepping for
> BPF_REG_PARAMS didn't help either.
This and other patches in this set has more information about this:
https://lore.kernel.org/all/20260513045015.2385013-1-yonghong.song@linux.dev/
>
> > +static void emit_stack_arg_store(u8 src_a64, s16 bpf_off, struct jit_ctx *ctx)
> > +{
> > + int idx = -bpf_off / sizeof(u64) - 1;
> > +
> > + if (idx < NR_STACK_ARG_REGS)
> > + emit(A64_MOV(1, stack_arg_reg[idx], src_a64), ctx);
> > + else
> > + emit(A64_STR64I(src_a64, A64_SP, (idx - NR_STACK_ARG_REGS) * sizeof(u64)), ctx);
> > +}
> > +
> > +static void emit_stack_arg_store_imm(s32 imm, s16 bpf_off, const u8 tmp, struct jit_ctx *ctx)
> > +{
> > + int idx = -bpf_off / sizeof(u64) - 1;
> > +
> > + emit_a64_mov_i(1, tmp, imm, ctx);
> > + if (idx < NR_STACK_ARG_REGS)
> > + emit(A64_MOV(1, stack_arg_reg[idx], tmp), ctx);
>
> nit: You seem to have redundant MOVs here.
Ack, will fix this.
>
> > @@ -2065,6 +2137,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
> > ctx.user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
> > ctx.arena_vm_start = bpf_arena_get_kern_vm_start(prog->aux->arena);
> >
> > + if (prog->aux->stack_arg_depth > prog->aux->incoming_stack_arg_depth) {
> > + u16 outgoing = prog->aux->stack_arg_depth - prog->aux->incoming_stack_arg_depth;
> > + int nr_on_stack = outgoing / sizeof(u64) - NR_STACK_ARG_REGS;
> > +
> > + if (nr_on_stack > 0)
> > + ctx.stack_arg_size = round_up(nr_on_stack * sizeof(u64), 16);
> > + }
>
> ah, that's presumably where you handle the SP alignment. I think a
> comment would really help folks here... In fact, how does this interact
> with the same sort of SP adjustment that already exists in build_prologue()?
> Can we avoid the pointless re-alignment?
Yeah, you are right, if ctx->stack_size + ctx-> stack_arg_size is
aligned then we don't need to separately align these two, I can move
this from here and consolidate it in the build_prologue() function and
also add a comment there.
> Will
More information about the linux-arm-kernel
mailing list