[RESEND PATCH bpf-next 1/2] bpf, arm64: Jit BPF_CALL to direct call when possible

Xu Kuohai xukuohai at huawei.com
Wed Oct 12 19:07:12 PDT 2022


On 9/27/2022 10:01 PM, Xu Kuohai wrote:
> On 9/27/2022 4:29 AM, Daniel Borkmann wrote:
>> [ +Mark/Florent ]
>>
>> On 9/19/22 11:21 AM, Xu Kuohai wrote:
>>> From: Xu Kuohai <xukuohai at huawei.com>
>>>
>>> Currently BPF_CALL is always jited to indirect call, but when target is
>>> in the range of direct call, BPF_CALL can be jited to direct call.
>>>
>>> For example, the following BPF_CALL
>>>
>>>      call __htab_map_lookup_elem
>>>
>>> is always jited to an indirect call:
>>>
>>>      mov     x10, #0xffffffffffff18f4
>>>      movk    x10, #0x821, lsl #16
>>>      movk    x10, #0x8000, lsl #32
>>>      blr     x10
>>>
>>> When the target is in the range of direct call, it can be jited to:
>>>
>>>      bl      0xfffffffffd33bc98
>>>
>>> This patch does such jit when possible.
>>>
>>> 1. First pass, get the maximum jited image size. Since the jited image
>>>     memory is not allocated yet, the distance between jited BPF_CALL
>>>     instructon and call target is unknown, so jit all BPF_CALL to indirect
>>>     call to get the maximum image size.
>>>
>>> 2. Allocate image memory with the size caculated in step 1.
>>>
>>> 3. Second pass, determine the jited address and size for every bpf instruction.
>>>     Since image memory is now allocated and there is only one jit method for
>>>     bpf instructions other than BPF_CALL, so the jited address for the first
>>>     BPF_CALL is determined, so the distance to call target is determined, so
>>>     the first BPF_CALL is determined to be jited to direct or indirect call,
>>>     so the jited image size after the first BPF_CALL is determined. By analogy,
>>>     the jited addresses and sizes for all subsequent BPF instructions are
>>>     determined.
>>>
>>> 4. Last pass, generate the final image. The jump offset of jump instruction
>>>     whose target is within the jited image is determined in this pass, since
>>>     the target instruction address may be changed in step 3.
>>
>> Wouldn't this require similar convergence process like in x86-64 JIT? You state
>> the jump instructions are placed in step 4 because step 3 could have changed their
>> offsets, but then after step 4, couldn't also again the offsets have changed for
>> the target addresses from 3 again in some corner cases (given emit_a64_mov_i() is
>> used also in jump encoding)?
>>
> 
> IIUC, the reason why there is a convergence process on x86 is that x86's jmp
> instruction length varies with the size of immediate part, so after immediate
> part is adjusted, the instruction length may change accordingly, and consequently
> cause the positions of subsequent instructions to change, which in turn causes
> the distance between instructions to change. However, arm64's instruction size
> is fixed to 4 bytes and does not change with immediate part changes. So adjusting
> the immediate part of arm64 jump instruction does not result in a change in
> instruction length or position.
> 
> For BPF_CALL, arguments passed to emit_call() and emit_a64_mov_i() (if called)
> do not change in pass 3 and 4, so the jited result does not change. This is also
> true for other non-BPF_JMP instructions.
> 
> So no convergence is required on arm64.
> 

Hi Daniel,

I think I should make it more clear.

Please take a look at the following code snippet, which jits BPF_JMP instructions
to arm64 instructions.

The code can be divided into two parts: the part where instruction offset jmp_offset
is used and the part where jmp_offset is not used.

1. Lines 963-966 and lines 990-1028 use jmp_offset. We can see that no matter what
    value of jmp_offset is, the jited result is emitted either at line 965 or at
    line 1027, which is exactly one arm64 instruction, that is, the jited size is
    always 4 bytes.

2. The other lines don't use jmp_offset. We can see that the input arguments,
    including arguments passed to emit_a64_mov_i and emit_call, do not change in
    pass 3 and pass 4, so the jited result also do not change.

  961         /* JUMP off */
  962         case BPF_JMP | BPF_JA:
  963                 jmp_offset = bpf2a64_offset(i, off, ctx);
  964                 check_imm26(jmp_offset);
  965                 emit(A64_B(jmp_offset), ctx);
  966                 break;
  967         /* IF (dst COND src) JUMP off */
  968         case BPF_JMP | BPF_JEQ | BPF_X:
  969         case BPF_JMP | BPF_JGT | BPF_X:
  970         case BPF_JMP | BPF_JLT | BPF_X:
  971         case BPF_JMP | BPF_JGE | BPF_X:
  972         case BPF_JMP | BPF_JLE | BPF_X:
  973         case BPF_JMP | BPF_JNE | BPF_X:
  974         case BPF_JMP | BPF_JSGT | BPF_X:
  975         case BPF_JMP | BPF_JSLT | BPF_X:
  976         case BPF_JMP | BPF_JSGE | BPF_X:
  977         case BPF_JMP | BPF_JSLE | BPF_X:
  978         case BPF_JMP32 | BPF_JEQ | BPF_X:
  979         case BPF_JMP32 | BPF_JGT | BPF_X:
  980         case BPF_JMP32 | BPF_JLT | BPF_X:
  981         case BPF_JMP32 | BPF_JGE | BPF_X:
  982         case BPF_JMP32 | BPF_JLE | BPF_X:
  983         case BPF_JMP32 | BPF_JNE | BPF_X:
  984         case BPF_JMP32 | BPF_JSGT | BPF_X:
  985         case BPF_JMP32 | BPF_JSLT | BPF_X:
  986         case BPF_JMP32 | BPF_JSGE | BPF_X:
  987         case BPF_JMP32 | BPF_JSLE | BPF_X:
  988                 emit(A64_CMP(is64, dst, src), ctx);
  989 emit_cond_jmp:
  990                 jmp_offset = bpf2a64_offset(i, off, ctx);
  991                 check_imm19(jmp_offset);
  992                 switch (BPF_OP(code)) {
  993                 case BPF_JEQ:
  994                         jmp_cond = A64_COND_EQ;
  995                         break;
  996                 case BPF_JGT:
  997                         jmp_cond = A64_COND_HI;
  998                         break;
  999                 case BPF_JLT:
1000                         jmp_cond = A64_COND_CC;
1001                         break;
1002                 case BPF_JGE:
1003                         jmp_cond = A64_COND_CS;
1004                         break;
1005                 case BPF_JLE:
1006                         jmp_cond = A64_COND_LS;
1007                         break;
1008                 case BPF_JSET:
1009                 case BPF_JNE:
1010                         jmp_cond = A64_COND_NE;
1011                         break;
1012                 case BPF_JSGT:
1013                         jmp_cond = A64_COND_GT;
1014                         break;
1015                 case BPF_JSLT:
1016                         jmp_cond = A64_COND_LT;
1017                         break;
1018                 case BPF_JSGE:
1019                         jmp_cond = A64_COND_GE;
1020                         break;
1021                 case BPF_JSLE:
1022                         jmp_cond = A64_COND_LE;
1023                         break;
1024                 default:
1025                         return -EFAULT;
1026                 }
1027                 emit(A64_B_(jmp_cond, jmp_offset), ctx);
1028                 break;
1029         case BPF_JMP | BPF_JSET | BPF_X:
1030         case BPF_JMP32 | BPF_JSET | BPF_X:
1031                 emit(A64_TST(is64, dst, src), ctx);
1032                 goto emit_cond_jmp;
1033         /* IF (dst COND imm) JUMP off */
1034         case BPF_JMP | BPF_JEQ | BPF_K:
1035         case BPF_JMP | BPF_JGT | BPF_K:
1036         case BPF_JMP | BPF_JLT | BPF_K:
1037         case BPF_JMP | BPF_JGE | BPF_K:
1038         case BPF_JMP | BPF_JLE | BPF_K:
1039         case BPF_JMP | BPF_JNE | BPF_K:
1040         case BPF_JMP | BPF_JSGT | BPF_K:
1041         case BPF_JMP | BPF_JSLT | BPF_K:
1042         case BPF_JMP | BPF_JSGE | BPF_K:
1043         case BPF_JMP | BPF_JSLE | BPF_K:
1044         case BPF_JMP32 | BPF_JEQ | BPF_K:
1045         case BPF_JMP32 | BPF_JGT | BPF_K:
1046         case BPF_JMP32 | BPF_JLT | BPF_K:
1047         case BPF_JMP32 | BPF_JGE | BPF_K:
1048         case BPF_JMP32 | BPF_JLE | BPF_K:
1049         case BPF_JMP32 | BPF_JNE | BPF_K:
1050         case BPF_JMP32 | BPF_JSGT | BPF_K:
1051         case BPF_JMP32 | BPF_JSLT | BPF_K:
1052         case BPF_JMP32 | BPF_JSGE | BPF_K:
1053         case BPF_JMP32 | BPF_JSLE | BPF_K:
1054                 if (is_addsub_imm(imm)) {
1055                         emit(A64_CMP_I(is64, dst, imm), ctx);
1056                 } else if (is_addsub_imm(-imm)) {
1057                         emit(A64_CMN_I(is64, dst, -imm), ctx);
1058                 } else {
1059                         emit_a64_mov_i(is64, tmp, imm, ctx);
1060                         emit(A64_CMP(is64, dst, tmp), ctx);
1061                 }
1062                 goto emit_cond_jmp;
1063         case BPF_JMP | BPF_JSET | BPF_K:
1064         case BPF_JMP32 | BPF_JSET | BPF_K:
1065                 a64_insn = A64_TST_I(is64, dst, imm);
1066                 if (a64_insn != AARCH64_BREAK_FAULT) {
1067                         emit(a64_insn, ctx);
1068                 } else {
1069                         emit_a64_mov_i(is64, tmp, imm, ctx);
1070                         emit(A64_TST(is64, dst, tmp), ctx);
1071                 }
1072                 goto emit_cond_jmp;
1073         /* function call */
1074         case BPF_JMP | BPF_CALL:
1075         {
1076                 const u8 r0 = bpf2a64[BPF_REG_0];
1077                 bool func_addr_fixed;
1078                 u64 func_addr;
1079
1080                 ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
1081                                             &func_addr, &func_addr_fixed);
1082                 if (ret < 0)
1083                         return ret;
1084                 emit_call(func_addr, ctx);
1085                 emit(A64_MOV(1, r0, A64_R(0)), ctx);
1086                 break;
1087         }
1088         /* tail call */
1089         case BPF_JMP | BPF_TAIL_CALL:
1090                 if (emit_bpf_tail_call(ctx))
1091                         return -EFAULT;
1092                 break;
1093         /* function return */
1094         case BPF_JMP | BPF_EXIT:
1095                 /* Optimization: when last instruction is EXIT,
1096                    simply fallthrough to epilogue. */
1097                 if (i == ctx->prog->len - 1)
1098                         break;
1099                 jmp_offset = epilogue_offset(ctx);
1100                 check_imm26(jmp_offset);
1101                 emit(A64_B(jmp_offset), ctx);
1102                 break;

In fact, what happens in step 3 and step 4 is almost the same as what happened in
pass 1 and pass 2 before this series, where there is no convergence either.

>>> Tested with test_bpf.ko and some arm64 working selftests, nothing failed.
> 
> [...]
> 
> .




More information about the linux-arm-kernel mailing list