[RFC PATCH 06/20] arm64: mm: place empty_zero_page in bss

Mark Rutland mark.rutland at arm.com
Thu Dec 10 07:51:11 PST 2015


On Thu, Dec 10, 2015 at 03:40:08PM +0000, Marc Zyngier wrote:
> On 10/12/15 15:29, Mark Rutland wrote:
> > On Thu, Dec 10, 2015 at 02:11:08PM +0000, Will Deacon wrote:
> >> On Wed, Dec 09, 2015 at 12:44:41PM +0000, Mark Rutland wrote:
> >>> Currently the zero page is set up in paging_init, and thus we cannot use
> >>> the zero page earlier. We use the zero page as a reserved TTBR value
> >>> from which no TLB entries may be allocated (e.g. when uninstalling the
> >>> idmap). To enable such usage earlier (as may be required for invasive
> >>> changes to the kernel page tables), and to minimise the time that the
> >>> idmap is active, we need to be able to use the zero page before
> >>> paging_init.
> >>>
> >>> This patch follows the example set by x86, by allocating the zero page
> >>> at compile time, in .bss. This means that the zero page itself is
> >>> available immediately upon entry to start_kernel (as we zero .bss before
> >>> this), and also means that the zero page takes up no space in the raw
> >>> Image binary. The associated struct page is allocated in bootmem_init,
> >>> and remains unavailable until this time.
> >>>
> >>> Outside of arch code, the only users of empty_zero_page assume that the
> >>> empty_zero_page symbol refers to the zeroed memory itself, and that
> >>> ZERO_PAGE(x) must be used to acquire the associated struct page,
> >>> following the example of x86. This patch also brings arm64 inline with
> >>> these assumptions.
> >>>
> >>> Signed-off-by: Mark Rutland <mark.rutland at arm.com>
> >>> Cc: Ard Biesheuvel <ard.biesheuvel at linaro.org>
> >>> Cc: Catalin Marinas <catalin.marinas at arm.com>
> >>> Cc: Jeremy Linton <jeremy.linton at arm.com>
> >>> Cc: Laura Abbott <labbott at fedoraproject.org>
> >>> Cc: Will Deacon <will.deacon at arm.com>
> >>> ---
> >>>  arch/arm64/include/asm/mmu_context.h | 2 +-
> >>>  arch/arm64/include/asm/pgtable.h     | 4 ++--
> >>>  arch/arm64/mm/mmu.c                  | 9 +--------
> >>>  3 files changed, 4 insertions(+), 11 deletions(-)
> >>
> >> [...]
> >>
> >>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >>> index 304ff23..7559c22 100644
> >>> --- a/arch/arm64/mm/mmu.c
> >>> +++ b/arch/arm64/mm/mmu.c
> >>> @@ -48,7 +48,7 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
> >>>   * Empty_zero_page is a special page that is used for zero-initialized data
> >>>   * and COW.
> >>>   */
> >>> -struct page *empty_zero_page;
> >>> +unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
> >>>  EXPORT_SYMBOL(empty_zero_page);
> >>
> >> I've been looking at this, and it was making me feel uneasy because it's
> >> full of junk before the bss is zeroed. Working that through, it's no
> >> worse than what we currently have but I then realised that (a) we don't
> >> have a dsb after zeroing the zero page (which we need to make sure the
> >> zeroes are visible to the page table walker and (b) the zero page is
> >> never explicitly cleaned to the PoC.
> > 
> > Ouch; that's scary.
> > 
> >> There may be cases where the zero-page is used to back read-only,
> >> non-cacheable mappings (something to do with KVM?), so I'd sleep better
> >> if we made sure that it was clean.
> > 
> > From a grep around for uses of ZERO_PAGE, in most places the zero page
> > is simply used as an empty buffer for I/O. In these cases it's either
> > accessed coherently or goes via the usual machinery for non-coherent DMA
> > kicks in.
> > 
> > I don't believe that we usually give userspace the ability to create
> > non-cacheable mappings, and I couldn't spot any paths it could do so via
> > some driver-specific IOCTL applied to the zero page.
> > 
> > Looking around, kvm_clear_guest_page seemed problematic, but isn't used
> > on arm64. I can imagine the zero page being mapped into guests in other
> > situations when mirroring the userspace mapping. 
> > 
> > Marc, Christoffer, I thought we cleaned pages to the PoC before mapping
> > them into a guest? Is that right? Or do we have potential issues there?
> 
> I think we're OK. Looking at __coherent_cache_guest_page (which is
> called when transitioning from an invalid to valid mapping), we do flush
> things to PoC if the vcpu has its cache disabled (or if we know that the
> IPA shouldn't be cached - the whole NOR flash emulation horror story).

So we asume the guest never disables the MMU, and always uses consistent
attributes for a given IPA (e.g. it doesn't have a Device and Normal
Cacheable mapping)?

> Does it answer your question?

I think so. If those assumptions are true then I agree we're ok. If
those aren't we have other problems.

Thanks,
Mark.



More information about the linux-arm-kernel mailing list