[PATCH bpf-next v2] bpf, arm64: Optimize BPF store/load using str/ldr with immediate offset

Mon Mar 14 15:11:26 PDT 2022

On 3/14/22 9:48 AM, Xu Kuohai wrote:
> The current BPF store/load instruction is translated by the JIT into two
> instructions. The first instruction moves the immediate offset into a
> temporary register. The second instruction uses this temporary register
> to do the real store/load.
> 
> In fact, arm64 supports addressing with immediate offsets. So This patch
> introduces optimization that uses arm64 str/ldr instruction with immediate
> offset when the offset fits.
> 
> Example of generated instuction for r2 = *(u64 *)(r1 + 0):
> 
> without optimization:
> mov x10, 0
> ldr x1, [x0, x10]
> 
> with optimization:
> ldr x1, [x0, 0]
> 
> If the offset is negative, or is not aligned correctly, or exceeds max
> value, rollback to the use of temporary register.
> 
> Result for test_bpf:
>   # dmesg -D
>   # insmod test_bpf.ko
>   # dmesg | grep Summary
>   test_bpf: Summary: 1009 PASSED, 0 FAILED, [997/997 JIT'ed]
>   test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed]
>   test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
> 
> Signed-off-by: Xu Kuohai <xukuohai at huawei.com>
[...]

Thanks for working on this and also including the result for test_bpf! Does it
also contain corner cases where the rollback to the temporary register is
triggered? (If not, lets add more test cases to it.)

Could you split this into two patches, one that touches arch/arm64/lib/insn.c
and arch/arm64/include/asm/insn.h for the instruction encoder, and then the
other part for the JIT-only bits?

Will, would you be okay if we route this via bpf-next with your Ack, or do we
need to pull feature branch again?

Thanks,
Daniel