[PATCH bpf-next v2 2/3] bpf, arm64: Add JIT support for stack arguments
Will Deacon
will at kernel.org
Thu May 28 06:43:08 PDT 2026
On Mon, Apr 27, 2026 at 04:47:59PM -0700, Puranjay Mohan wrote:
> Implement stack argument passing for BPF-to-BPF and kfunc calls with
> more than 5 parameters on arm64, following the AAPCS64 calling
> convention.
>
> BPF R1-R5 already map to x0-x4. With BPF_REG_0 moved to x8 by the
> previous commit, x5-x7 are free for arguments 6-8. Arguments 9-12
> spill onto the stack at [SP+0], [SP+8], ... and the callee reads
> them from [FP+16], [FP+24], ... (above the saved FP/LR pair).
How does that work with kfuncs? I think the PCS means that they will
expect to pick the stack arguments starting at [SP+0]. Or are you
saying that SP == FP+16 on entry to the callee? It's hard to reconcile
that with the ASCII art in build_prologue() because neither "current
A64_FP" nor "BPF_FP" point below A64_SP and it's not clear which of them
you're referring to when you refer to "FP" on its own.
> BPF convention uses fixed offsets from BPF_REG_PARAMS (r11): off=-8 is
> always arg 6, off=-16 arg 7, etc. The verifier invalidates all outgoing
> stack arg slots after each call, so the compiler must re-store before
> every call. This means x5-x7 don't need to be saved on stack.
>
> Signed-off-by: Yonghong Song <yonghong.song at linux.dev>
> Signed-off-by: Puranjay Mohan <puranjay at kernel.org>
> ---
> arch/arm64/net/bpf_jit_comp.c | 87 ++++++++++++++++++++++++++++++++++-
> 1 file changed, 86 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 085e650662e3..cd8279880795 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -86,6 +86,7 @@ struct jit_ctx {
> __le32 *image;
> __le32 *ro_image;
> u32 stack_size;
> + u16 stack_arg_size;
> u64 user_vm_start;
> u64 arena_vm_start;
> bool fp_used;
> @@ -533,13 +534,19 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
> * | |
> * +-----+ <= (BPF_FP - prog->aux->stack_depth)
> * |RSVD | padding
> - * current A64_SP => +-----+ <= (BPF_FP - ctx->stack_size)
> + * +-----+ <= (BPF_FP - ctx->stack_size)
> + * | |
> + * | ... | outgoing stack args (9+, if any)
> + * | |
> + * current A64_SP => +-----+
> * | |
> * | ... | Function call stack
> * | |
> * +-----+
> * low
> *
> + * Stack args 6-8 are passed in x5-x7, args 9+ at [SP].
> + * Incoming args 9+ are at [FP + 16], [FP + 24], ...
> */
I assume the arguments being passed are all <= 64-bit scalar types? If
we ever want to pass anything with > 64-bit alignment or to a varargs
function, then the rules for allocation get a little hairy.
> emit_kcfi(is_main_prog ? cfi_bpf_hash : cfi_bpf_subprog_hash, ctx);
> @@ -613,6 +620,9 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
> if (ctx->stack_size && !ctx->priv_sp_used)
> emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
>
> + if (ctx->stack_arg_size)
> + emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_arg_size), ctx);
How do you ensure that the stack pointer is always 16-byte aligned? We
run with SP alignment checking enabled, so you need to take care of that.
> @@ -1191,6 +1207,41 @@ static int add_exception_handler(const struct bpf_insn *insn,
> return 0;
> }
>
> +static const u8 stack_arg_reg[] = { A64_R(5), A64_R(6), A64_R(7) };
> +
> +#define NR_STACK_ARG_REGS ARRAY_SIZE(stack_arg_reg)
> +
> +static void emit_stack_arg_load(u8 dst, s16 bpf_off, struct jit_ctx *ctx)
> +{
> + int idx = bpf_off / sizeof(u64) - 1;
> +
> + if (idx < NR_STACK_ARG_REGS)
> + emit(A64_MOV(1, dst, stack_arg_reg[idx]), ctx);
> + else
> + emit(A64_LDR64I(dst, A64_FP, (idx - NR_STACK_ARG_REGS) * sizeof(u64) + 16), ctx);
> +}
Is it worth asserting that bpf_off >= 8 here or can we rely on that? I
struggled to find any details about how bpf passes arguments on the
stack (beyond what you describe in the commit message) and grepping for
BPF_REG_PARAMS didn't help either.
> +static void emit_stack_arg_store(u8 src_a64, s16 bpf_off, struct jit_ctx *ctx)
> +{
> + int idx = -bpf_off / sizeof(u64) - 1;
> +
> + if (idx < NR_STACK_ARG_REGS)
> + emit(A64_MOV(1, stack_arg_reg[idx], src_a64), ctx);
> + else
> + emit(A64_STR64I(src_a64, A64_SP, (idx - NR_STACK_ARG_REGS) * sizeof(u64)), ctx);
> +}
> +
> +static void emit_stack_arg_store_imm(s32 imm, s16 bpf_off, const u8 tmp, struct jit_ctx *ctx)
> +{
> + int idx = -bpf_off / sizeof(u64) - 1;
> +
> + emit_a64_mov_i(1, tmp, imm, ctx);
> + if (idx < NR_STACK_ARG_REGS)
> + emit(A64_MOV(1, stack_arg_reg[idx], tmp), ctx);
nit: You seem to have redundant MOVs here.
> @@ -2065,6 +2137,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
> ctx.user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
> ctx.arena_vm_start = bpf_arena_get_kern_vm_start(prog->aux->arena);
>
> + if (prog->aux->stack_arg_depth > prog->aux->incoming_stack_arg_depth) {
> + u16 outgoing = prog->aux->stack_arg_depth - prog->aux->incoming_stack_arg_depth;
> + int nr_on_stack = outgoing / sizeof(u64) - NR_STACK_ARG_REGS;
> +
> + if (nr_on_stack > 0)
> + ctx.stack_arg_size = round_up(nr_on_stack * sizeof(u64), 16);
> + }
ah, that's presumably where you handle the SP alignment. I think a
comment would really help folks here... In fact, how does this interact
with the same sort of SP adjustment that already exists in build_prologue()?
Can we avoid the pointless re-alignment?
Will
More information about the linux-arm-kernel
mailing list