[PATCH net-next v3] arm: eBPF JIT compiler

Sat Aug 19 12:59:02 PDT 2017

> impressive work.
> Acked-by: Alexei Starovoitov <ast at kernel.org>

Thanks :)

I can't take all the credit. It was Daniel and Kees who helped me a lot.
I would have given up a long time ago without them.
>
> Any performance numbers with vs without JIT ?

Here is the mail from Kees on v1 of the patch.

For what it's worth, I did an comparison of the numbers Shubham posted
in another thread for the JIT, comparing the eBPF interpreter with his
new JIT. The post is here:

https://www.spinics.net/lists/netdev/msg436402.html

Other than that I can send the test runs which have time, but I will
not be able to compare them like kees this week.
Does that sound good?
>
>> +static const u8 bpf2a32[][2] = {
>> +       /* return value from in-kernel function, and exit value from eBPF
>> */
>> +       [BPF_REG_0] = {ARM_R1, ARM_R0},
>> +       /* arguments from eBPF program to in-kernel function */
>> +       [BPF_REG_1] = {ARM_R3, ARM_R2},
>
>
> as far as i understand arm32 calling convention the mapping makes sense
> to me. Hard to come up with anything better than the above.
I tried different versions of it, according to the need of different
eBPF instructions, as you can see, we are register deficient. This is
the best I could come up with.
Would love to hear any improvement over this.
>
>> +       /* function call */
>> +       case BPF_JMP | BPF_CALL:
>> +       {
>> +               const u8 *r0 = bpf2a32[BPF_REG_0];
>> +               const u8 *r1 = bpf2a32[BPF_REG_1];
>> +               const u8 *r2 = bpf2a32[BPF_REG_2];
>> +               const u8 *r3 = bpf2a32[BPF_REG_3];
>> +               const u8 *r4 = bpf2a32[BPF_REG_4];
>> +               const u8 *r5 = bpf2a32[BPF_REG_5];
>> +               const u32 func = (u32)__bpf_call_base + (u32)imm;
>> +
>> +               emit_a32_mov_r64(true, r0, r1, false, false, ctx);
>> +               emit_a32_mov_r64(true, r1, r2, false, true, ctx);
>> +               emit_push_r64(r5, 0, ctx);
>> +               emit_push_r64(r4, 8, ctx);
>> +               emit_push_r64(r3, 16, ctx);
>> +
>> +               emit_a32_mov_i(tmp[1], func, false, ctx);
>> +               emit_blx_r(tmp[1], ctx);
>
>
> to improve the cost of call we can teach verifier to mark the registers
> actually used to pass arguments, so not all pushes would be needed.
> But it may be drop in the bucket comparing to the cost of compound
> 64-bit alu ops.
Thats right. But still an improvement I guess. I think I discussed it
with Daniel and I thought, I should get this patch reach mainstream
first then I can improve on it.
> There was some work on llvm side to use 32-bit subregisters which
> should help 32-bit architectures and JITs, but it didn't go far.
> So if you're interested further improving bpf program speeds on arm32
> you may take a look at llvm side. I can certainly provide the tips.
Sure. Sounds good.

Best,
Shubham