[bpf-next v6 4/5] bpf, x86: Emit ENDBR for indirect jump targets
Xu Kuohai
xukuohai at huaweicloud.com
Fri Mar 6 19:56:06 PST 2026
On 3/7/2026 11:31 AM, Alexei Starovoitov wrote:
> On Fri, Mar 6, 2026 at 7:15 PM Xu Kuohai <xukuohai at huaweicloud.com> wrote:
>>
>> On 3/7/2026 9:36 AM, Eduard Zingerman wrote:
>>> On Fri, 2026-03-06 at 18:23 +0800, Xu Kuohai wrote:
>>>> From: Xu Kuohai <xukuohai at huawei.com>
>>>>
>>>> On CPUs that support CET/IBT, the indirect jump selftest triggers
>>>> a kernel panic because the indirect jump targets lack ENDBR
>>>> instructions.
>>>>
>>>> To fix it, emit an ENDBR instruction to each indirect jump target. Since
>>>> the ENDBR instruction shifts the position of original jited instructions,
>>>> fix the instruction address calculation wherever the addresses are used.
>>>>
>>>> For reference, below is a sample panic log.
>>>>
>>>> Missing ENDBR: bpf_prog_2e5f1c71c13ac3e0_big_jump_table+0x97/0xe1
>>>> ------------[ cut here ]------------
>>>> kernel BUG at arch/x86/kernel/cet.c:133!
>>>> Oops: invalid opcode: 0000 [#1] SMP NOPTI
>>>>
>>>> ...
>>>>
>>>> ? 0xffffffffc00fb258
>>>> ? bpf_prog_2e5f1c71c13ac3e0_big_jump_table+0x97/0xe1
>>>> bpf_prog_test_run_syscall+0x110/0x2f0
>>>> ? fdget+0xba/0xe0
>>>> __sys_bpf+0xe4b/0x2590
>>>> ? __kmalloc_node_track_caller_noprof+0x1c7/0x680
>>>> ? bpf_prog_test_run_syscall+0x215/0x2f0
>>>> __x64_sys_bpf+0x21/0x30
>>>> do_syscall_64+0x85/0x620
>>>> ? bpf_prog_test_run_syscall+0x1e2/0x2f0
>>>>
>>>> Fixes: 493d9e0d6083 ("bpf, x86: add support for indirect jumps")
>>>> Signed-off-by: Xu Kuohai <xukuohai at huawei.com>
>>>> ---
>>>> arch/x86/net/bpf_jit_comp.c | 23 +++++++++++++++--------
>>>> 1 file changed, 15 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>>>> index 2c57ee446fc9..752331a64fc0 100644
>>>> --- a/arch/x86/net/bpf_jit_comp.c
>>>> +++ b/arch/x86/net/bpf_jit_comp.c
>>>> @@ -1658,8 +1658,8 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip,
>>>> return 0;
>>>> }
>>>>
>>>> -static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
>>>> - int oldproglen, struct jit_context *ctx, bool jmp_padding)
>>>> +static int do_jit(struct bpf_verifier_env *env, struct bpf_prog *bpf_prog, int *addrs, u8 *image,
>>>> + u8 *rw_image, int oldproglen, struct jit_context *ctx, bool jmp_padding)
>>>> {
>>>> bool tail_call_reachable = bpf_prog->aux->tail_call_reachable;
>>>> struct bpf_insn *insn = bpf_prog->insnsi;
>>>> @@ -1743,6 +1743,11 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>>>> dst_reg = X86_REG_R9;
>>>> }
>>>>
>>>> +#ifdef CONFIG_X86_KERNEL_IBT
>>>> + if (bpf_insn_is_indirect_target(env, bpf_prog, i - 1))
>>>> + EMIT_ENDBR();
>>>> +#endif
>>>> +
>>>> switch (insn->code) {
>>>> /* ALU */
>>>> case BPF_ALU | BPF_ADD | BPF_X:
>>>> @@ -2449,7 +2454,7 @@ st: if (is_imm8(insn->off))
>>>>
>>>> /* call */
>>>> case BPF_JMP | BPF_CALL: {
>>>> - u8 *ip = image + addrs[i - 1];
>>>> + u8 *ip = image + addrs[i - 1] + (prog - temp);
>>>
>>> Sorry, meant to reply to v5 but got distracted.
>>> It seems tedious/error prone to have this addend at each location,
>>> would it be possible to move the 'ip' variable calculation outside
>>> of the switch? It appears that at each point there would be no
>>> EMIT invocations between 'ip' computation and usage.
>>>
>>
>> Besides the changes shown in this patch, there is another line in the
>> file computing address using 'image + addrs[i - 1] + (prog - temp)'.
>>
>> It is at the call to emit_return() in the 'BPF_JMP | BPF_EXIT' case.
>> But there are indeed EMIT*() invocations before the address copmutation,
>> so a pre-computed address before the switch statement is stale in
>> this case.
>>
>> To fix this, how about introducing a macro for the address computation,
>> as the following diff (based on this patch) shows:
>>
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -1606,6 +1606,8 @@ static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
>>
>> #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
>>
>> +#define CURR_IP (image + addrs[i - 1] + (prog - temp))
>> +
>
> No. Don't obfuscate it with macro.
> I don't like INSN_SZ_DIFF either.
>
>
OK, will use Eduward's approach as is.
More information about the linux-arm-kernel
mailing list