[PATCH] ARM: mm: Disregard user space addresses in BUG() address check

Thu Jan 18 12:15:15 PST 2024

On Wed, Jan 17, 2024 at 04:07:34PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb at kernel.org>
>
> is_valid_bugaddr() dereferences the faulting PC to fetch the instruction
> that triggered the fault, to decide whether it is a BRK instruction used
> to force an exception. This is used by the BUG() infrastructure to keep
> the handling logic (which should never execute) separate from the code
> that normally runs.
>
> This dereference may attempt to access user memory if the faulting PC
> happens to contain a user address. One way this might happen is when
> the kernel is tricked into executing from user space while PAN
> protections (Privileged Access Never) are in effect: the instruction
> fetch will trigger a prefetch abort, the handling of which involves a
> check whether the instruction that caused it is a BRK, requiring a
> load from the same address. This load is privileged too, and so it will
> trigger another exception, which we fail to recover from.
>
> Given that BRK instructions tied to BUG() handling can only appear in
> kernel code, let's check first that the PC actually points into kernel
> memory.
>
> Cc: Kees Cook <keescook at chromium.org>
> Cc: Russell King <rmk+kernel at armlinux.org.uk>
> Cc: Mark Brown <broonie at kernel.org>
> Cc: Zhen Lei <thunder.leizhen at huawei.com>
> Cc: Linus Walleij <linus.walleij at linaro.org>
> Link: https://lkml.kernel.org/r/202401111544.18EBB6AA%40keescook
> Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> ---
>  arch/arm/kernel/traps.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> index 3bad79db5d6e..f342bd6b2a5d 100644
> --- a/arch/arm/kernel/traps.c
> +++ b/arch/arm/kernel/traps.c
> @@ -402,6 +402,9 @@ int is_valid_bugaddr(unsigned long pc)
>  	u32 insn = __opcode_to_mem_arm(BUG_INSTR_VALUE);
>  #endif
>
> +	if (pc < TASK_SIZE)
> +		return 0;
> +
>  	if (get_kernel_nofault(bkpt, (void *)pc))
>  		return 0;
>

Okay, I've finally had a chance to test this and it doesn't fix it
for me. I still see the double domain fault, and I still lose a CPU.
I patched in a pr_info() to show that the "pc < TASK_SIZE" was hitting
(which it is) so I see:


kdtm: Performing direct entry EXEC_USERSPACE
lkdtm: attempting ok execution at 80761b4c
lkdtm: attempting bad execution at 76ff4000
Unhandled prefetch abort: page domain fault (0x01b) at 0x76ff4000
Skipping is_valid_bugaddr(76ff4000)
Internal error: : 1b [#12] SMP ARM
Modules linked in:
...
Process cat (pid: 1301, stack limit = 0x3c8ec418)
Stack: (0xf0975e58 to 0xf0976000)
...
Backtrace:
 lkdtm_EXEC_USERSPACE from lkdtm_do_action+0x2c/0x4c
...
Exception stack(0xf0975fa8 to 0xf0975ff0)
...
8<--- cut here ---
Unhandled fault: page domain fault (0x01b) at 0x76ff3ff0
[76ff3ff0] *pgd=44baa835, *pte=00000000, *ppte=00000000
Internal error: : 1b [#13] SMP ARM
Modules linked in:
...
Process cat (pid: 1301, stack limit = 0x3c8ec418)
Stack: (0xf0975ce0 to 0xf0976000)
...
Backtrace:
 copy_from_kernel_nofault from dump_instr+0x1cc/0x208
 r7:f0975d27 r6:80ddf3d8 r5:f0975e08 r4:fffffffc
 dump_instr from die+0x294/0x2f0
 r10:8110a3dc r9:8100eb60 r8:60070193 r7:82760000 r6:0000001b r5:80df6c8c
 r4:f0975e08
 die from arm_notify_die+0x54/0x58
 r10:82760000 r9:82760000 r8:810ab75c r7:81010058 r6:f0975e08 r5:76ff4000
 r4:0000001b
 arm_notify_die from do_PrefetchAbort+0x90/0x98
 do_PrefetchAbort from __pabt_svc+0x5c/0xa0
Exception stack(0xf0975e08 to 0xf0975e50)
...
Exception stack(0xf0975fa8 to 0xf0975ff0)
...
---[ end trace 0000000000000000 ]---
note: cat[1301] exited with irqs disabled

But then that CPU is gone.

For example, on a 2 CPU arm32 instance, if I run EXEC_USERSPACE twice
the whole system dies. If I run it once, it will eventually die. Along
the way I'll see:

rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
rcu:    1-...0: (1 ticks this GP) idle=199c/1/0x40000000 softirq=3056/3056 fqs=4702
rcu:    (detected by 0, t=10407 jiffies, g=4113, q=51 ncpus=2)
Sending NMI from CPU 0 to CPUs 1:

But I never get the system back.

Compared to ACCESS_USERSPACE, I see:

lkdtm: Performing direct entry ACCESS_USERSPACE
lkdtm: attempting bad read at 76f2b000
8<--- cut here ---
Unhandled fault: page domain fault (0x01b) at 0x76f2b000
[76f2b000] *pgd=42591835, *pte=4691055f, *ppte=46910c7e
Internal error: : 1b [#1] SMP ARM
Modules linked in:
...
Process cat (pid: 1286, stack limit = 0x9ff640f6)
Stack: (0xf0965e50 to 0xf0966000)
...
Backtrace:
 lkdtm_ACCESS_USERSPACE from lkdtm_do_action+0x2c/0x4c
...
Exception stack(0xf0965fa8 to 0xf0965ff0)
...
---[ end trace 0000000000000000 ]---

So it looks like everything between the "lkdtm: " lines and the
"8<--- cut here ---" line is unexpected in the EXEC_USERSPACE case?

I tried combinations of:

CONFIG_UNWINDER_FRAME_POINTER
CONFIG_UNWINDER_ARM
CONFIG_DEBUG_USER

but nothing changed the behavior. Further pr_info() debugging shows me:


lkdtm: attempting bad execution at 76fc2000
Unhandled prefetch abort: page domain fault (0x01b) at 0x76fc2000
user_mode(regs) is false
Skipping is_valid_bugaddr(76fc2000)
Internal error: : 1b [#1] SMP ARM
...
Exception stack(0xf0a99fa8 to 0xf0a99ff0)
9fa0:                   00000074 7ff00000 00000001 76a08000 0000000f 00000000
9fc0: 00000074 7ff00000 00000000 00000004 00000001 00000000 00020000 00000000
9fe0: 00000004 7ea81a88 76f315b3 76eba746
8<--- cut here (do_DataAbort) ---
Unhandled fault: page domain fault (0x01b) at 0x76fc1ff0
[76fc1ff0] *pgd=45da2835, *pte=00000000, *ppte=00000000
user_mode(regs) is false
Already called die!

If I skip the second die, I end up in a loop:

8<--- cut here (do_DataAbort) ---
Unhandled fault: page domain fault (0x01b) at 0x76fcdff0
[76fcdff0] *pgd=45dd6835, *pte=00000000, *ppte=00000000
user_mode(regs) is false
Already called die -- skipping this one.
8<--- cut here (do_DataAbort) ---
...


But if I comment out dump_instr(), I don't get the double fault, and the
CPU survives. So, borrowing from Ard's patch, this fixes it for me:

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index f342bd6b2a5d..3874afccb574 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -300,7 +300,8 @@ static int __die(const char *str, int err, struct pt_regs *regs)
 			 ALIGN(regs->ARM_sp - THREAD_SIZE, THREAD_ALIGN)
 			 + THREAD_SIZE);
 		dump_backtrace(regs, tsk, KERN_EMERG);
-		dump_instr(KERN_EMERG, regs);
+		if (instruction_pointer(regs) >= TASK_SIZE)
+			dump_instr(KERN_EMERG, regs);
 	}
 
 	return 0;


Is this the right approach?

Also, sticking with arm32 tradition, it seems a "cut here" is missing
for the prefetch case:

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index e96fb40b9cc3..df44f83c9c41 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -586,6 +586,7 @@ do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
 	if (!inf->fn(addr, ifsr | FSR_LNX_PF, regs))
 		return;
 
+	pr_alert("8<--- cut here ---\n");
 	pr_alert("Unhandled prefetch abort: %s (0x%03x) at 0x%08lx\n",
 		inf->name, ifsr, addr);
 

--
Kees Cook