[PATCH] ARM: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
Ard Biesheuvel
ardb at kernel.org
Sun Jan 24 08:35:31 EST 2021
On Fri, 22 Jan 2021 at 17:32, Ard Biesheuvel <ardb at kernel.org> wrote:
>
> On Fri, 22 Jan 2021 at 17:13, Russell King - ARM Linux admin
> <linux at armlinux.org.uk> wrote:
> >
> > On Fri, Jan 22, 2021 at 04:20:12PM +0100, Ard Biesheuvel wrote:
> > > To ensure that no cache lines cover any of the data that is accessed by
> > > the booting kernel with the MMU off, cover the uncompressed kernel's BSS
> > > region in the cache clean operation.
> > >
> > > Also, to ensure that no cachelines are allocated while the cache is being
> > > cleaned, perform the cache clean operation *after* disabling the MMU and
> > > caches when running on v7 or later, by making a tail call to the clean
> > > routine from the cache_off routine. This requires passing the VA range
> > > to cache_off(), which means some care needs to be taken to preserve
> > > R0 and R1 across the call to cache_off().
> > >
> > > Since this makes the first cache clean redundant, call it with the
> > > range reduced to zero. This only affects v7, as all other versions
> > > ignore R0/R1 entirely.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> >
> > Seems to work, thanks! I'd suggest we follow up with this patch which
> > gets rid of all the register shuffling:
> >
>
> Agreed
>
So what I think is happening is the following:
In v5.7 and before, the set/way operations trap into KVM, which sets
another trap bit to ensure that second trap occurs the next time the
MMU is disabled. So if any cachelines are allocated after the call to
cache_clean_flush(), they will be invalidated again when KVM
invalidates the VM's entire IPA space.
According to DDI0406C.d paragraph B3.2.1, it is implementation defined
whether non-cacheable accesses that occur with MMU/caches disabled may
hit in the data cache.
So after v5.7, without set/way instructions being issued, the second
trap is never set, and so the only cache clean+invalidate that occurs
is the one that the decompressor performs itself, and the one that KVM
does on the guest's behalf at cache_off() time is omitted. This
results in clean cachelines being allocated that shadow the
mini-stack, which are hit by the non-cacheable accesses that occur
before the kernel proper enables the MMU again.
Reordering the clean+invalidate with the MMU/cache disabling prevent
the issue, as disabling the MMU and caches first disables any mappings
that the cache could perform speculative linefills from, and so the
mini-stack memory access cannot be served from the cache.
> > 8<===
> > From: Russell King <rmk+kernel at armlinux.org.uk>
> > Subject: [PATCH] ARM: decompressor: tidy up register usage
> >
> > Tidy up the registers so we don't end up having to shuffle values
> > between registers to work around other register usages.
> >
> > Signed-off-by: Russell King <rmk+kernel at armlinux.org.uk>
>
> Acked-by: Ard Biesheuvel <ardb at kernel.org>
>
> > ---
> > arch/arm/boot/compressed/head.S | 41 +++++++++++++++------------------
> > 1 file changed, 19 insertions(+), 22 deletions(-)
> >
> > diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> > index b44738110095..c0a13004c5d4 100644
> > --- a/arch/arm/boot/compressed/head.S
> > +++ b/arch/arm/boot/compressed/head.S
> > @@ -930,6 +930,7 @@ ENDPROC(__setup_mmu)
>
> Can we drop the r1 = corrupted here now?
>
> > * r2 = corrupted
> > * r3 = block offset
> > * r9 = corrupted
> > + * r10 = corrupted
> > * r12 = corrupted
> > */
> >
> > @@ -949,10 +950,10 @@ call_cache_fn: adr r12, proc_types
> > #else
> > ldr r9, =CONFIG_PROCESSOR_ID
> > #endif
> > -1: ldr r1, [r12, #0] @ get value
> > +1: ldr r10, [r12, #0] @ get value
> > ldr r2, [r12, #4] @ get mask
> > - eor r1, r1, r9 @ (real ^ match)
> > - tst r1, r2 @ & mask
> > + eor r10, r10, r9 @ (real ^ match)
> > + tst r10, r2 @ & mask
> > ARM( addeq pc, r12, r3 ) @ call cache function
> > THUMB( addeq r12, r3 )
> > THUMB( moveq pc, r12 ) @ call cache function
> > @@ -1139,8 +1140,6 @@ call_cache_fn: adr r12, proc_types
> > */
> > .align 5
> > cache_off: mov r3, #12 @ cache_off function
> > - mov r10, r0
> > - mov r11, r1
> > b call_cache_fn
> >
> > __armv4_mpu_cache_off:
> > @@ -1173,22 +1172,21 @@ cache_off: mov r3, #12 @ cache_off function
> > mov pc, lr
> >
> > __armv7_mmu_cache_off:
> > - mrc p15, 0, r0, c1, c0
> > + mrc p15, 0, r3, c1, c0
> > #ifdef CONFIG_MMU
> > - bic r0, r0, #0x000d
> > + bic r3, r3, #0x000d
> > #else
> > - bic r0, r0, #0x000c
> > + bic r3, r3, #0x000c
> > #endif
> > - mcr p15, 0, r0, c1, c0 @ turn MMU and cache off
> > - mov r0, #0
> > + mcr p15, 0, r3, c1, c0 @ turn MMU and cache off
> > + mov r3, #0
> > #ifdef CONFIG_MMU
> > - mcr p15, 0, r0, c8, c7, 0 @ invalidate whole TLB
> > + mcr p15, 0, r3, c8, c7, 0 @ invalidate whole TLB
> > #endif
> > - mcr p15, 0, r0, c7, c5, 6 @ invalidate BTC
> > - mcr p15, 0, r0, c7, c10, 4 @ DSB
> > - mcr p15, 0, r0, c7, c5, 4 @ ISB
> > + mcr p15, 0, r3, c7, c5, 6 @ invalidate BTC
> > + mcr p15, 0, r3, c7, c10, 4 @ DSB
> > + mcr p15, 0, r3, c7, c5, 4 @ ISB
> >
> > - mov r0, r10
> > b __armv7_mmu_cache_flush
> >
> > /*
> > @@ -1205,7 +1203,6 @@ cache_off: mov r3, #12 @ cache_off function
> > .align 5
> > cache_clean_flush:
> > mov r3, #16
> > - mov r11, r1
> > b call_cache_fn
> >
> > __armv4_mpu_cache_flush:
> > @@ -1256,15 +1253,15 @@ cache_off: mov r3, #12 @ cache_off function
> > mcr p15, 0, r10, c7, c14, 0 @ clean+invalidate D
> > b iflush
> > hierarchical:
> > - dcache_line_size r1, r2 @ r1 := dcache min line size
> > - sub r2, r1, #1 @ r2 := line size mask
> > + dcache_line_size r11, r2 @ r11 := dcache min line size
> > + sub r2, r11, #1 @ r2 := line size mask
> > bic r0, r0, r2 @ round down start to line size
> > - sub r11, r11, #1 @ end address is exclusive
> > - bic r11, r11, r2 @ round down end to line size
> > -0: cmp r0, r11 @ finished?
> > + sub r1, r1, #1 @ end address is exclusive
> > + bic r1, r1, r2 @ round down end to line size
> > +0: cmp r0, r1 @ finished?
> > bgt iflush
> > mcr p15, 0, r0, c7, c14, 1 @ Dcache clean/invalidate by VA
> > - add r0, r0, r1
> > + add r0, r0, r11
> > b 0b
> > iflush:
> > mcr p15, 0, r10, c7, c10, 4 @ DSB
> > --
> > 2.20.1
> >
> > --
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
More information about the linux-arm-kernel
mailing list