[PATCH] ARM: decompressor: cover BSS in cache clean and reorder with MMU disable on v7

Ard Biesheuvel ardb at kernel.org
Sun Jan 24 08:35:31 EST 2021


On Fri, 22 Jan 2021 at 17:32, Ard Biesheuvel <ardb at kernel.org> wrote:
>
> On Fri, 22 Jan 2021 at 17:13, Russell King - ARM Linux admin
> <linux at armlinux.org.uk> wrote:
> >
> > On Fri, Jan 22, 2021 at 04:20:12PM +0100, Ard Biesheuvel wrote:
> > > To ensure that no cache lines cover any of the data that is accessed by
> > > the booting kernel with the MMU off, cover the uncompressed kernel's BSS
> > > region in the cache clean operation.
> > >
> > > Also, to ensure that no cachelines are allocated while the cache is being
> > > cleaned, perform the cache clean operation *after* disabling the MMU and
> > > caches when running on v7 or later, by making a tail call to the clean
> > > routine from the cache_off routine. This requires passing the VA range
> > > to cache_off(), which means some care needs to be taken to preserve
> > > R0 and R1 across the call to cache_off().
> > >
> > > Since this makes the first cache clean redundant, call it with the
> > > range reduced to zero. This only affects v7, as all other versions
> > > ignore R0/R1 entirely.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> >
> > Seems to work, thanks! I'd suggest we follow up with this patch which
> > gets rid of all the register shuffling:
> >
>
> Agreed
>

So what I think is happening is the following:

In v5.7 and before, the set/way operations trap into KVM, which sets
another trap bit to ensure that second trap occurs the next time the
MMU is disabled. So if any cachelines are allocated after the call to
cache_clean_flush(), they will be invalidated again when KVM
invalidates the VM's entire IPA space.

According to DDI0406C.d paragraph B3.2.1, it is implementation defined
whether non-cacheable accesses that occur with MMU/caches disabled may
hit in the data cache.

So after v5.7, without set/way instructions being issued, the second
trap is never set, and so the only cache clean+invalidate that occurs
is the one that the decompressor performs itself, and the one that KVM
does on the guest's behalf at cache_off() time is omitted. This
results in clean cachelines being allocated that shadow the
mini-stack, which are hit by the non-cacheable accesses that occur
before the kernel proper enables the MMU again.

Reordering the clean+invalidate with the MMU/cache disabling prevent
the issue, as disabling the MMU and caches first disables any mappings
that the cache could perform speculative linefills from, and so the
mini-stack memory access cannot be served from the cache.


> > 8<===
> > From: Russell King <rmk+kernel at armlinux.org.uk>
> > Subject: [PATCH] ARM: decompressor: tidy up register usage
> >
> > Tidy up the registers so we don't end up having to shuffle values
> > between registers to work around other register usages.
> >
> > Signed-off-by: Russell King <rmk+kernel at armlinux.org.uk>
>
> Acked-by: Ard Biesheuvel <ardb at kernel.org>
>
> > ---
> >  arch/arm/boot/compressed/head.S | 41 +++++++++++++++------------------
> >  1 file changed, 19 insertions(+), 22 deletions(-)
> >
> > diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> > index b44738110095..c0a13004c5d4 100644
> > --- a/arch/arm/boot/compressed/head.S
> > +++ b/arch/arm/boot/compressed/head.S
> > @@ -930,6 +930,7 @@ ENDPROC(__setup_mmu)
>
> Can we drop the r1 = corrupted here now?
>
> >   *  r2  = corrupted
> >   *  r3  = block offset
> >   *  r9  = corrupted
> > + *  r10 = corrupted
> >   *  r12 = corrupted
> >   */
> >
> > @@ -949,10 +950,10 @@ call_cache_fn:    adr     r12, proc_types
> >  #else
> >                 ldr     r9, =CONFIG_PROCESSOR_ID
> >  #endif
> > -1:             ldr     r1, [r12, #0]           @ get value
> > +1:             ldr     r10, [r12, #0]          @ get value
> >                 ldr     r2, [r12, #4]           @ get mask
> > -               eor     r1, r1, r9              @ (real ^ match)
> > -               tst     r1, r2                  @       & mask
> > +               eor     r10, r10, r9            @ (real ^ match)
> > +               tst     r10, r2                 @       & mask
> >   ARM(          addeq   pc, r12, r3             ) @ call cache function
> >   THUMB(                addeq   r12, r3                 )
> >   THUMB(                moveq   pc, r12                 ) @ call cache function
> > @@ -1139,8 +1140,6 @@ call_cache_fn:    adr     r12, proc_types
> >   */
> >                 .align  5
> >  cache_off:     mov     r3, #12                 @ cache_off function
> > -               mov     r10, r0
> > -               mov     r11, r1
> >                 b       call_cache_fn
> >
> >  __armv4_mpu_cache_off:
> > @@ -1173,22 +1172,21 @@ cache_off:      mov     r3, #12                 @ cache_off function
> >                 mov     pc, lr
> >
> >  __armv7_mmu_cache_off:
> > -               mrc     p15, 0, r0, c1, c0
> > +               mrc     p15, 0, r3, c1, c0
> >  #ifdef CONFIG_MMU
> > -               bic     r0, r0, #0x000d
> > +               bic     r3, r3, #0x000d
> >  #else
> > -               bic     r0, r0, #0x000c
> > +               bic     r3, r3, #0x000c
> >  #endif
> > -               mcr     p15, 0, r0, c1, c0      @ turn MMU and cache off
> > -               mov     r0, #0
> > +               mcr     p15, 0, r3, c1, c0      @ turn MMU and cache off
> > +               mov     r3, #0
> >  #ifdef CONFIG_MMU
> > -               mcr     p15, 0, r0, c8, c7, 0   @ invalidate whole TLB
> > +               mcr     p15, 0, r3, c8, c7, 0   @ invalidate whole TLB
> >  #endif
> > -               mcr     p15, 0, r0, c7, c5, 6   @ invalidate BTC
> > -               mcr     p15, 0, r0, c7, c10, 4  @ DSB
> > -               mcr     p15, 0, r0, c7, c5, 4   @ ISB
> > +               mcr     p15, 0, r3, c7, c5, 6   @ invalidate BTC
> > +               mcr     p15, 0, r3, c7, c10, 4  @ DSB
> > +               mcr     p15, 0, r3, c7, c5, 4   @ ISB
> >
> > -               mov     r0, r10
> >                 b       __armv7_mmu_cache_flush
> >
> >  /*
> > @@ -1205,7 +1203,6 @@ cache_off:        mov     r3, #12                 @ cache_off function
> >                 .align  5
> >  cache_clean_flush:
> >                 mov     r3, #16
> > -               mov     r11, r1
> >                 b       call_cache_fn
> >
> >  __armv4_mpu_cache_flush:
> > @@ -1256,15 +1253,15 @@ cache_off:      mov     r3, #12                 @ cache_off function
> >                 mcr     p15, 0, r10, c7, c14, 0 @ clean+invalidate D
> >                 b       iflush
> >  hierarchical:
> > -               dcache_line_size r1, r2         @ r1 := dcache min line size
> > -               sub     r2, r1, #1              @ r2 := line size mask
> > +               dcache_line_size r11, r2        @ r11 := dcache min line size
> > +               sub     r2, r11, #1             @ r2 := line size mask
> >                 bic     r0, r0, r2              @ round down start to line size
> > -               sub     r11, r11, #1            @ end address is exclusive
> > -               bic     r11, r11, r2            @ round down end to line size
> > -0:             cmp     r0, r11                 @ finished?
> > +               sub     r1, r1, #1              @ end address is exclusive
> > +               bic     r1, r1, r2              @ round down end to line size
> > +0:             cmp     r0, r1                  @ finished?
> >                 bgt     iflush
> >                 mcr     p15, 0, r0, c7, c14, 1  @ Dcache clean/invalidate by VA
> > -               add     r0, r0, r1
> > +               add     r0, r0, r11
> >                 b       0b
> >  iflush:
> >                 mcr     p15, 0, r10, c7, c10, 4 @ DSB
> > --
> > 2.20.1
> >
> > --
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!



More information about the linux-arm-kernel mailing list