[PATCH] ARM: mm: Disregard user space addresses in BUG() address check

Fri Jan 19 04:24:25 PST 2024

On Fri, 19 Jan 2024 at 13:14, Russell King (Oracle)
<linux at armlinux.org.uk> wrote:
>
> On Fri, Jan 19, 2024 at 12:52:21PM +0100, Ard Biesheuvel wrote:
> > On Thu, 18 Jan 2024 at 21:15, Kees Cook <keescook at chromium.org> wrote:
> > >
> > > On Wed, Jan 17, 2024 at 04:07:34PM +0100, Ard Biesheuvel wrote:
> > > > From: Ard Biesheuvel <ardb at kernel.org>
> > > >
> > > > is_valid_bugaddr() dereferences the faulting PC to fetch the instruction
> > > > that triggered the fault, to decide whether it is a BRK instruction used
> > > > to force an exception. This is used by the BUG() infrastructure to keep
> > > > the handling logic (which should never execute) separate from the code
> > > > that normally runs.
> > > >
> > > > This dereference may attempt to access user memory if the faulting PC
> > > > happens to contain a user address. One way this might happen is when
> > > > the kernel is tricked into executing from user space while PAN
> > > > protections (Privileged Access Never) are in effect: the instruction
> > > > fetch will trigger a prefetch abort, the handling of which involves a
> > > > check whether the instruction that caused it is a BRK, requiring a
> > > > load from the same address. This load is privileged too, and so it will
> > > > trigger another exception, which we fail to recover from.
> > > >
> > > > Given that BRK instructions tied to BUG() handling can only appear in
> > > > kernel code, let's check first that the PC actually points into kernel
> > > > memory.
> > > >
> > > > Cc: Kees Cook <keescook at chromium.org>
> > > > Cc: Russell King <rmk+kernel at armlinux.org.uk>
> > > > Cc: Mark Brown <broonie at kernel.org>
> > > > Cc: Zhen Lei <thunder.leizhen at huawei.com>
> > > > Cc: Linus Walleij <linus.walleij at linaro.org>
> > > > Link: https://lkml.kernel.org/r/202401111544.18EBB6AA%40keescook
> > > > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> > > > ---
> > > >  arch/arm/kernel/traps.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> > > > index 3bad79db5d6e..f342bd6b2a5d 100644
> > > > --- a/arch/arm/kernel/traps.c
> > > > +++ b/arch/arm/kernel/traps.c
> > > > @@ -402,6 +402,9 @@ int is_valid_bugaddr(unsigned long pc)
> > > >       u32 insn = __opcode_to_mem_arm(BUG_INSTR_VALUE);
> > > >  #endif
> > > >
> > > > +     if (pc < TASK_SIZE)
> > > > +             return 0;
> > > > +
> > > >       if (get_kernel_nofault(bkpt, (void *)pc))
> > > >               return 0;
> > > >
> > >
> > > Okay, I've finally had a chance to test this and it doesn't fix it
> > > for me. I still see the double domain fault, and I still lose a CPU.
> > > I patched in a pr_info() to show that the "pc < TASK_SIZE" was hitting
> > > (which it is) so I see:
> > >
> > >
> > > kdtm: Performing direct entry EXEC_USERSPACE
> > > lkdtm: attempting ok execution at 80761b4c
> > > lkdtm: attempting bad execution at 76ff4000
> > > Unhandled prefetch abort: page domain fault (0x01b) at 0x76ff4000
> > > Skipping is_valid_bugaddr(76ff4000)
> > > Internal error: : 1b [#12] SMP ARM
> > > Modules linked in:
> > > ...
> > > Process cat (pid: 1301, stack limit = 0x3c8ec418)
> > > Stack: (0xf0975e58 to 0xf0976000)
> > > ...
> > > Backtrace:
> > >  lkdtm_EXEC_USERSPACE from lkdtm_do_action+0x2c/0x4c
> > > ...
> > > Exception stack(0xf0975fa8 to 0xf0975ff0)
> > > ...
> > > 8<--- cut here ---
> > > Unhandled fault: page domain fault (0x01b) at 0x76ff3ff0
> > > [76ff3ff0] *pgd=44baa835, *pte=00000000, *ppte=00000000
> > > Internal error: : 1b [#13] SMP ARM
> > > Modules linked in:
> > > ...
> > > Process cat (pid: 1301, stack limit = 0x3c8ec418)
> > > Stack: (0xf0975ce0 to 0xf0976000)
> > > ...
> > > Backtrace:
> > >  copy_from_kernel_nofault from dump_instr+0x1cc/0x208
> > >  r7:f0975d27 r6:80ddf3d8 r5:f0975e08 r4:fffffffc
> > >  dump_instr from die+0x294/0x2f0
> >
> > This is indeed a similar but different issue (as you conclude below).
> > dump_instr() attempts to dereference PC to generate the 'Code: xxxx
> > xxxx' line with the opcodes leading up to the fault, and this triggers
> > a PAN fault for the same reason.
>
> Both of these have faulted in copy_from_kernel_nofault(), so rather than
> adding these tests all over the place, wouldn't it be better to
> implement copy_from_kernel_nofault_allowed() so that we verify that the
> source is actually something we can copy from?
>
> E.g.
>
> bool copy_from_kernel_nofault_allowed(const void *src, size_t size)
> {
>         unsigned long srcv = (unsigned long)src;
>
>         return src >= TASK_SIZE && src + size - 1 >= TASK_SIZE;
> }
>

Indeed. I had no idea this existed, but obviously,
copy_from_kernel_nofault() should never access user memory to begin
with.