[PATCH v7 5/7] arm64: vmemmap: Avoid base2 order of struct page size to dimension region

Mark Rutland mark.rutland at arm.com
Wed Dec 13 06:39:30 PST 2023


On Wed, Dec 13, 2023 at 03:09:54PM +0100, Ard Biesheuvel wrote:
> On Wed, 13 Dec 2023 at 14:49, Mark Rutland <mark.rutland at arm.com> wrote:
> >
> > Hi Ard,
> >
> > On Wed, Dec 13, 2023 at 09:40:30AM +0100, Ard Biesheuvel wrote:
> > > From: Ard Biesheuvel <ardb at kernel.org>
> > >
> > > The placement and size of the vmemmap region in the kernel virtual
> > > address space is currently derived from the base2 order of the size of a
> > > struct page. This makes for nicely aligned constants with lots of
> > > leading 0xf and trailing 0x0 digits, but given that the actual struct
> > > pages are indexed as an ordinary array, this resulting region is
> > > severely overdimensioned when the size of a struct page is just over a
> > > power of 2.
> > >
> > > This doesn't matter today, but once we enable 52-bit virtual addressing
> > > for 4k pages configurations, the vmemmap region may take up almost half
> > > of the upper VA region with the current struct page upper bound at 64
> > > bytes. And once we enable KMSAN or other features that push the size of
> > > a struct page over 64 bytes, we will run out of VMALLOC space entirely.
> > >
> > > So instead, let's derive the region size from the actual size of a
> > > struct page, and place the entire region 1 GB from the top of the VA
> > > space, where it still doesn't share any lower level translation table
> > > entries with the fixmap.
> > >
> > > Acked-by: Mark Rutland <mark.rutland at arm.com>
> > > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> > > ---
> > >  arch/arm64/include/asm/memory.h | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > > index 2745bed8ae5b..b49575a92afc 100644
> > > --- a/arch/arm64/include/asm/memory.h
> > > +++ b/arch/arm64/include/asm/memory.h
> > > @@ -30,8 +30,8 @@
> > >   * keep a constant PAGE_OFFSET and "fallback" to using the higher end
> > >   * of the VMEMMAP where 52-bit support is not available in hardware.
> > >   */
> > > -#define VMEMMAP_SHIFT        (PAGE_SHIFT - STRUCT_PAGE_MAX_SHIFT)
> > > -#define VMEMMAP_SIZE ((_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET) >> VMEMMAP_SHIFT)
> > > +#define VMEMMAP_RANGE        (_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET)
> > > +#define VMEMMAP_SIZE ((VMEMMAP_RANGE >> PAGE_SHIFT) * sizeof(struct page))
> > >
> > >  /*
> > >   * PAGE_OFFSET - the virtual address of the start of the linear map, at the
> > > @@ -47,8 +47,8 @@
> > >  #define MODULES_END          (MODULES_VADDR + MODULES_VSIZE)
> > >  #define MODULES_VADDR                (_PAGE_END(VA_BITS_MIN))
> > >  #define MODULES_VSIZE                (SZ_2G)
> > > -#define VMEMMAP_START                (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
> > > -#define VMEMMAP_END          (VMEMMAP_START + VMEMMAP_SIZE)
> > > +#define VMEMMAP_START                (VMEMMAP_END - VMEMMAP_SIZE)
> > > +#define VMEMMAP_END          (-UL(SZ_1G))
> > >  #define PCI_IO_START         (VMEMMAP_END + SZ_8M)
> > >  #define PCI_IO_END           (PCI_IO_START + PCI_IO_SIZE)
> > >  #define FIXADDR_TOP          (-UL(SZ_8M))
> >
> > I realise I've acked this already, but my big concern here is still that it's
> > hard to see why these don't overlap (though the assert in fixmap.c will save
> > us). Usually we try to make that clear by construction, and I think we can do
> > that here with something like:
> >
> > | #define GUARD_VA_SIZE          (UL(SZ_8M))
> > |
> > | #define FIXADDR_TOP            (-GUARD_VA_SIZE)
> > | #define FIXADDR_SIZE_MAX       SZ_8M
> > | #define FIXADDR_START_MIN      (FIXADDR_TOP - FIXADDR_SIZE_MAX)
> > |
> > | #define PCI_IO_END             (FIXADDR_START_MIN - GUARD_VA_SIZE)
> > | #define PCI_IO_START           (PCI_IO_END - PCI_IO_SIZE)
> > |
> > | #define VMEMMAP_END            (ALIGN_DOWN(PCI_IO_START - GUARD_VA_SIZE, SZ_1G))
> > | #define VMEMMAP_START          (VMEMMAP_END - VMEMMAP_SIZE)
> >
> > ... and in fixmap.h have:
> >
> > /* Ensure the estimate in memory.h was big enough */
> > static_assert(FIXADDR_SIZE_MAX > FIXADDR_SIZE);
> >
> > I might be missing some reason why we can't do that; I locally tried the above
> > atop this series with defconfig+4K and defcconfig+64K, and both build and boot
> > without sisue.
> >
> 
> I am really struggling to understand what the issue is that you are
> trying to solve here.
> 
> What I am proposing is
> 
> #define VMEMMAP_END          (-UL(SZ_1G))
> #define PCI_IO_START         (VMEMMAP_END + SZ_8M)
> #define PCI_IO_END           (PCI_IO_START + PCI_IO_SIZE)
> #define FIXADDR_TOP          (-UL(SZ_8M))
> 
> and in fixmap.c
> 
> static_assert(FIXADDR_TOT_START > PCI_IO_END);
> 
> (which sadly has to live outside of asm/memory.h for reasons of #include hell).
> 
> This leaves the top 1G for PCI I/O plus fixmap, and ensures that the
> latter does not collide with the former.

Sure, I get that (and that #include hell is why I have FIXADDR_SIZE_MAX above).

What I'm trying to do is make the relationship between those regions clear *in
one place*, so that this is easier to follow, as that was one of the things I
found painful during review. That said, it's not clear cut, and I'll happily
defer to the judgement of Will and Catalin.

For the series as-is:

Acked-by: Mark Rutland <mark.rutland at arm.com>

Will/Catalin, please let me know your preference on the above.

Thanks,
Mark.



More information about the linux-arm-kernel mailing list