[PATCH] arm64: insn: Route BTI to simulate_nop to avoid XOL/SS at function entry

Mark Rutland mark.rutland at arm.com
Tue Nov 11 02:26:44 PST 2025


On Thu, Nov 06, 2025 at 04:19:55PM +0530, Khaja Hussain Shaik Khaji wrote:
> On arm64 with branch protection, functions typically begin with a BTI
> (Branch Target Identification) landing pad. Today the decoder treats BTI
> as requiring out-of-line single-step (XOL), allocating a slot and placing
> an SS-BRK. Under SMP this leaves a small window before DAIF is masked
> where an asynchronous exception or nested probe can interleave and clear
> current_kprobe, resulting in an SS-BRK panic.

If you can take an exception here, and current_kprobe gets cleared, then
XOL stepping is broken in general, but just for BTI.

> Handle BTI like NOP in the decoder and simulate it (advance PC by one
> instruction). This avoids XOL/SS-BRK at these sites and removes the
> single-step window, while preserving correctness for kprobes since BTI’s
> branch-target enforcement has no program-visible effect in this EL1
> exception context.

One of the reasons for doing this out-of-line is that we should be able
to mark the XOL slot as a guarded page, and get the correct BTI
behaviour. It looks like we don't currently do that, which is a bug.

Just skipping the BTI isn't right; that throws away the BTI target
check.

> In practice BTI is most commonly observed at function entry, so the main
> effect of this change is to eliminate entry-site single-stepping. Other
> instructions and non-entry sites are unaffected.
> 
> Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji at oss.qualcomm.com>
> ---
>  arch/arm64/include/asm/insn.h            | 5 -----
>  arch/arm64/kernel/probes/decode-insn.c   | 9 ++++++---
>  arch/arm64/kernel/probes/simulate-insn.c | 1 +
>  3 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> index 18c7811774d3..7e80cc1f0c3d 100644
> --- a/arch/arm64/include/asm/insn.h
> +++ b/arch/arm64/include/asm/insn.h
> @@ -452,11 +452,6 @@ static __always_inline bool aarch64_insn_is_steppable_hint(u32 insn)
>  	case AARCH64_INSN_HINT_PACIASP:
>  	case AARCH64_INSN_HINT_PACIBZ:
>  	case AARCH64_INSN_HINT_PACIBSP:
> -	case AARCH64_INSN_HINT_BTI:
> -	case AARCH64_INSN_HINT_BTIC:
> -	case AARCH64_INSN_HINT_BTIJ:
> -	case AARCH64_INSN_HINT_BTIJC:
> -	case AARCH64_INSN_HINT_NOP:
>  		return true;
>  	default:
>  		return false;
> diff --git a/arch/arm64/kernel/probes/decode-insn.c b/arch/arm64/kernel/probes/decode-insn.c
> index 6438bf62e753..7ce2cf5e21d3 100644
> --- a/arch/arm64/kernel/probes/decode-insn.c
> +++ b/arch/arm64/kernel/probes/decode-insn.c
> @@ -79,10 +79,13 @@ enum probe_insn __kprobes
>  arm_probe_decode_insn(u32 insn, struct arch_probe_insn *api)
>  {
>  	/*
> -	 * While 'nop' instruction can execute in the out-of-line slot,
> -	 * simulating them in breakpoint handling offers better performance.
> +	 * NOP and BTI (Branch Target Identification) have no program‑visible side
> +	 * effects for kprobes purposes. Simulate them to avoid XOL/SS‑BRK and the
> +	 * small single‑step window. BTI’s branch‑target enforcement semantics are
> +	 * irrelevant in this EL1 kprobe context, so advancing PC by one insn is
> +	 * sufficient here.
>  	 */
> -	if (aarch64_insn_is_nop(insn)) {
> +	if (aarch64_insn_is_nop(insn) || aarch64_insn_is_bti(insn)) {
>  		api->handler = simulate_nop;
>  		return INSN_GOOD_NO_SLOT;
>  	}

I'm not necessarily opposed to emulating the BTI, but:

(a) The BTI should not be emulated as a NOP. I am not keen on simulating
    the BTI exception in software, and would strongly prefer that's
    handled by HW (e.g. in the XOL slot).

(b) As above, it sounds like this is bodging around a more general
    problem. We must solve that more general problem.

> diff --git a/arch/arm64/kernel/probes/simulate-insn.c b/arch/arm64/kernel/probes/simulate-insn.c
> index 4c6d2d712fbd..b83312cb70ba 100644
> --- a/arch/arm64/kernel/probes/simulate-insn.c
> +++ b/arch/arm64/kernel/probes/simulate-insn.c
> @@ -200,5 +200,6 @@ simulate_ldrsw_literal(u32 opcode, long addr, struct pt_regs *regs)
>  void __kprobes
>  simulate_nop(u32 opcode, long addr, struct pt_regs *regs)
>  {
> +	/* Also used as BTI simulator: both just advance PC by one insn. */
>  	arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
>  }

This comment should go.

Mark.



More information about the linux-arm-kernel mailing list