[PATCH v3 4/4] ARM: Implement PAN for LPAE by TTBR0 page table walks disablement

Ard Biesheuvel ardb at kernel.org
Wed May 15 08:41:45 PDT 2024


On Wed, 15 May 2024 at 14:27, Russell King (Oracle)
<linux at armlinux.org.uk> wrote:
>
> On Wed, May 15, 2024 at 10:53:47AM +0200, Ard Biesheuvel wrote:
> > On Wed, 15 May 2024 at 10:48, Russell King (Oracle)
> > <linux at armlinux.org.uk> wrote:
> > >
> > > On Wed, May 15, 2024 at 10:36:48AM +0200, Ard Biesheuvel wrote:
> > > > On Tue, 14 May 2024 at 22:34, Florian Fainelli <f.fainelli at gmail.com> wrote:
> > > > >
> > > > > On 5/14/24 13:33, Linus Walleij wrote:
> > > > > > On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli at gmail.com> wrote:
> > > > > >> On 5/14/24 10:03, Russell King (Oracle) wrote:
> > > > > >
> > > > > >>> I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> > > > > >>> a "memory" clobber to the asm() instruction in there.
> > > > > >>>
> > > > > >>
> > > > > >> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
> > > > > >>
> > > > > >> diff --git a/arch/arm/include/asm/proc-fns.h
> > > > > >> b/arch/arm/include/asm/proc-fns.h
> > > > > >> index 9b3105a2a5e0..1087bd2af433 100644
> > > > > >> --- a/arch/arm/include/asm/proc-fns.h
> > > > > >> +++ b/arch/arm/include/asm/proc-fns.h
> > > > > >> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
> > > > > >>
> > > > > >>    static inline void cpu_set_ttbcr(unsigned int ttbcr)
> > > > > >>    {
> > > > > >> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
> > > > > >> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
> > > > > >>    }
> > > > > >>
> > > > > >>    #else  /*!CONFIG_MMU */
> > > > > >>
> > > > > >> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
> > > > > >>
> > > > > >> Thanks a lot Russell!
> > > > > >
> > > > > > Second that, very nicely pinpointed Russell!
> > > > > >
> > > > > > Florian, do you want to send a patch or should I?
> > > > >
> > > > > I was wondering if Russell was able to fold this directly into patch #2
> > > > > where cpu_set_ttbr() is added, so as to not break functionality across
> > > > > bisection.
> > > >
> > > > Sadly, I can still reproduce this with the above fix.
> > > >
> > > > I included TTBCR in the DEBUG_USER output, and (as expected), it has
> > > > A1, EPD0 and T0SZ set to the 'uaccess disabled' values.
> > > >
> > > > Using __always_inline on uaccess_save_and_enable() and
> > > > uaccess_restore() (as the CONFIG_CPU_SW_DOMAIN_PAN does) seems to work
> > > > around it. The "memory" clobber seems unnecessary in my case, but it
> > > > is needed for correctness in any case.
> > > >
> > > > It is unclear to me why placing these helpers out of line should make
> > > > any difference, and I am not convinced it is a problem in the code
> > > > (IIRC we've had other issues with -Os in the past)
> > >
> > > Time to start comparing compilers / compiler versions?
> > >
> >
> > I guess.
> >
> > I'll try to dig a bit deeper this afternoon - AIUI, there is never a
> > valid reason to return to user space with uaccess disabled, right? So
> > we should be able to catch that case using a BUG() or similar, and
> > crash in the kernel rather than in user space when this occurs. That
> > should make it a bit easier to deal with in the debugger etc.
>
> That _should_ be handled on the kernel exit to userspace path. The
> kernel itself should be running with user access disabled except for
> the explicit accesses.
>

This definitely looks like some pathological compiler behavior:

The faulting instruction in question is in busybox

   15f450:       eef11a10        vmrs    r1, fpscr

which [as expected] triggers an UNDEF exception as FP is disabled
after a context switch.

The get_user() call in do_undefinstr() [arch/arm/kernel/traps.c:482]

    if (get_user(instr, (u32 __user *)pc))

gets compiled to the below, where the thing to note is that the
out-of-line version of uaccess_save_and_enable() returns the old ttbcr
value in R0. This value gets recorded in R3, but it also gets happily
passed on to __get_user_4(), which expects the user address in R0, not
the old value of TTBCR.

│    0xc040a5e4 <do_undefinstr+228>  bl      0xc0409fe4
<uaccess_save_and_enable>
│    0xc040a5e8 <do_undefinstr+232>  mov     r3, r0
│  > 0xc040a5ec <do_undefinstr+236>  bl      0xc10443a8 <__get_user_4>
│    0xc040a5f0 <do_undefinstr+240>  mcr     15, 0, r3, cr2, cr0, {2}
│    0xc040a5f4 <do_undefinstr+244>  isb     sy

With __always_inline, this part is emitted as

 5fc:   ee124f50        mrc     15, 0, r4, cr2, cr0, {2}
 600:   e0033004        and     r3, r3, r4
 604:   ee023f50        mcr     15, 0, r3, cr2, cr0, {2}
 608:   f57ff06f        isb     sy
 60c:   ebfffffe        bl      0 <__get_user_4>
                        60c: R_ARM_CALL __get_user_4
 610:   ee024f50        mcr     15, 0, r4, cr2, cr0, {2}
 614:   f57ff06f        isb     sy

and so R0 is preserved, and the issue does not happen.

Not sure how to reduce this to a reproducer that can be used to report
the issue to the GCC folks, but it is most definitely a compiler
problem, as far as I can tell.



More information about the linux-arm-kernel mailing list