[PATCH 00/16] mm: Introduce MAP_BELOW_HINT
Charlie Jenkins
charlie at rivosinc.com
Wed Aug 28 14:39:55 PDT 2024
On Wed, Aug 28, 2024 at 01:59:18PM -0700, Charlie Jenkins wrote:
> On Wed, Aug 28, 2024 at 02:31:42PM -0400, Liam R. Howlett wrote:
> > * Charlie Jenkins <charlie at rivosinc.com> [240828 01:49]:
> > > Some applications rely on placing data in free bits addresses allocated
> > > by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
> > > address returned by mmap to be less than the maximum address space,
> > > unless the hint address is greater than this value.
> >
> > Wait, what arch(s) allows for greater than the max? The passed hint
> > should be where we start searching, but we go to the lower limit then
> > start at the hint and search up (or vice-versa on the directions).
> >
>
> I worded this awkwardly. On arm64 there is a page-table boundary at 48
> bits and at 52 bits. On x86 the boundaries are at 48 bits and 57 bits.
> The max value mmap is able to return on arm64 is 48 bits if the hint
> address uses 48 bits or less, even if the architecture supports 5-level
> paging and thus addresses can be 52 bits. Applications can opt-in to
> using up to 52-bits in an address by using a hint address greater than
> 48 bits. x86 has the same behavior but with 57 bits instead of 52.
>
> This reason this exists is because some applications arbitrarily replace
> bits in virtual addresses with data with an assumption that the address
> will not be using any of the bits above bit 48 in the virtual address.
> As hardware with larger address spaces was released, x86 decided to
> build safety guards into the kernel to allow the applications that made
> these assumptions to continue to work on this different hardware.
>
> This causes all application that use a hint address to silently be
> restricted to 48-bit addresses. The goal of this flag is to have a way
> for applications to explicitly request how many bits they want mmap to
> use.
>
> > I don't understand how unmapping works on a higher address; we would
> > fail to free it on termination of the application.
> >
> > Also, there are archs that map outside of the VMAs, which are freed by
> > freeing from the prev->vm_end to next->vm_start, so I don't understand
> > what that looks like in this reality as well.
> >
> > >
> > > On arm64 this barrier is at 52 bits and on x86 it is at 56 bits. This
> > > flag allows applications a way to specify exactly how many bits they
> > > want to be left unused by mmap. This eliminates the need for
> > > applications to know the page table hierarchy of the system to be able
> > > to reason which addresses mmap will be allowed to return.
> >
> > But, why do they need to know today? We have a limit for this don't we?
>
> The limit is different for different architectures. On x86 the limit is
> 57 bits, and on arm64 it is 52 bits. So in the theoretical case that an
> application requires 10 bits free in a virtual address, the application
> would always work on arm64 regardless of the hint address, but on x86 if
> the hint address is greater than 48 bits then the application will not
> work.
>
> The goal of this flag is to have consistent and tunable behavior of
> mmap() when it is desired to ensure that mmap() only returns addresses
> that use some number of bits.
>
> >
> > Also, these upper limits are how some archs use the upper bits that you
> > are trying to use.
> >
>
> It does not eliminate the existing behavior of the architectures to
> place this upper limits, it instead provides a way to have consistent
> behavior across all architectures.
>
> > >
> > > ---
> > > riscv made this feature of mmap returning addresses less than the hint
> > > address the default behavior. This was in contrast to the implementation
> > > of x86/arm64 that have a single boundary at the 5-level page table
> > > region. However this restriction proved too great -- the reduced
> > > address space when using a hint address was too small.
> >
> > Yes, the hint is used to group things close together so it would
> > literally be random chance on if you have enough room or not (aslr and
> > all).
> >
> > >
> > > A patch for riscv [1] reverts the behavior that broke userspace. This
> > > series serves to make this feature available to all architectures.
> >
> > I don't fully understand this statement, you say it broke userspace so
> > now you are porting it to everyone? This reads as if you are braking
> > the userspace on all architectures :)
>
> It was the default for mmap on riscv. The difference here is that it is now
> enabled by a flag instead. Instead of making the flag specific to riscv,
> I figured that other architectures might find it useful as well.
>
> >
> > If you fail to find room below, then your application fails as there is
> > no way to get the upper bits you need. It would be better to fix this
> > in userspace - if your application is returned too high an address, then
> > free it and exit because it's going to fail anyways.
> >
>
> This flag is trying to define an API that is more robust than the
> current behavior on that x86 and arm64 which implicitly restricts mmap()
> addresses to 48 bits. A solution could be to just write in the docs that
> mmap() will always exhaust all addresses below the hint address before
> returning an address that is above the hint address. However a flag that
> defines this behavior seems more intuitive.
>
> > >
> > > I have only tested on riscv and x86.
> >
> > This should be an RFC then.
>
> Fair enough.
>
> >
> > > There is a tremendous amount of
> > > duplicated code in mmap so the implementations across architectures I
> > > believe should be mostly consistent. I added this feature to all
> > > architectures that implement either
> > > arch_get_mmap_end()/arch_get_mmap_base() or
> > > arch_get_unmapped_area_topdown()/arch_get_unmapped_area(). I also added
> > > it to the default behavior for arch_get_mmap_end()/arch_get_mmap_base().
> >
> > Way too much duplicate code. We should be figuring out how to make this
> > all work with the same code.
> >
> > This is going to make the cloned code problem worse.
>
> That would require standardizing every architecture with the generic
> mmap() framework that arm64 has developed. That is far outside the scope
> of this patch, but would be a great area to research for each of the
> architectures that do not use the generic framework.
Thinking about this again, I could drop support for all architectures
that do not implement arch_get_mmap_base()/arch_get_mmap_end().
>
> - Charlie
>
> >
> > >
> > > Link: https://lore.kernel.org/lkml/20240826-riscv_mmap-v1-2-cd8962afe47f@rivosinc.com/T/ [1]
> > >
> > > To: Arnd Bergmann <arnd at arndb.de>
> > > To: Paul Walmsley <paul.walmsley at sifive.com>
> > > To: Palmer Dabbelt <palmer at dabbelt.com>
> > > To: Albert Ou <aou at eecs.berkeley.edu>
> > > To: Catalin Marinas <catalin.marinas at arm.com>
> > > To: Will Deacon <will at kernel.org>
> > > To: Michael Ellerman <mpe at ellerman.id.au>
> > > To: Nicholas Piggin <npiggin at gmail.com>
> > > To: Christophe Leroy <christophe.leroy at csgroup.eu>
> > > To: Naveen N Rao <naveen at kernel.org>
> > > To: Muchun Song <muchun.song at linux.dev>
> > > To: Andrew Morton <akpm at linux-foundation.org>
> > > To: Liam R. Howlett <Liam.Howlett at oracle.com>
> > > To: Vlastimil Babka <vbabka at suse.cz>
> > > To: Lorenzo Stoakes <lorenzo.stoakes at oracle.com>
> > > To: Thomas Gleixner <tglx at linutronix.de>
> > > To: Ingo Molnar <mingo at redhat.com>
> > > To: Borislav Petkov <bp at alien8.de>
> > > To: Dave Hansen <dave.hansen at linux.intel.com>
> > > To: x86 at kernel.org
> > > To: H. Peter Anvin <hpa at zytor.com>
> > > To: Huacai Chen <chenhuacai at kernel.org>
> > > To: WANG Xuerui <kernel at xen0n.name>
> > > To: Russell King <linux at armlinux.org.uk>
> > > To: Thomas Bogendoerfer <tsbogend at alpha.franken.de>
> > > To: James E.J. Bottomley <James.Bottomley at HansenPartnership.com>
> > > To: Helge Deller <deller at gmx.de>
> > > To: Alexander Gordeev <agordeev at linux.ibm.com>
> > > To: Gerald Schaefer <gerald.schaefer at linux.ibm.com>
> > > To: Heiko Carstens <hca at linux.ibm.com>
> > > To: Vasily Gorbik <gor at linux.ibm.com>
> > > To: Christian Borntraeger <borntraeger at linux.ibm.com>
> > > To: Sven Schnelle <svens at linux.ibm.com>
> > > To: Yoshinori Sato <ysato at users.sourceforge.jp>
> > > To: Rich Felker <dalias at libc.org>
> > > To: John Paul Adrian Glaubitz <glaubitz at physik.fu-berlin.de>
> > > To: David S. Miller <davem at davemloft.net>
> > > To: Andreas Larsson <andreas at gaisler.com>
> > > To: Shuah Khan <shuah at kernel.org>
> > > To: Alexandre Ghiti <alexghiti at rivosinc.com>
> > > Cc: linux-arch at vger.kernel.org
> > > Cc: linux-kernel at vger.kernel.org
> > > Cc: Palmer Dabbelt <palmer at rivosinc.com>
> > > Cc: linux-riscv at lists.infradead.org
> > > Cc: linux-arm-kernel at lists.infradead.org
> > > Cc: linuxppc-dev at lists.ozlabs.org
> > > Cc: linux-mm at kvack.org
> > > Cc: loongarch at lists.linux.dev
> > > Cc: linux-mips at vger.kernel.org
> > > Cc: linux-parisc at vger.kernel.org
> > > Cc: linux-s390 at vger.kernel.org
> > > Cc: linux-sh at vger.kernel.org
> > > Cc: sparclinux at vger.kernel.org
> > > Cc: linux-kselftest at vger.kernel.org
> > > Signed-off-by: Charlie Jenkins <charlie at rivosinc.com>
> > >
> > > ---
> > > Charlie Jenkins (16):
> > > mm: Add MAP_BELOW_HINT
> > > riscv: mm: Do not restrict mmap address based on hint
> > > mm: Add flag and len param to arch_get_mmap_base()
> > > mm: Add generic MAP_BELOW_HINT
> > > riscv: mm: Support MAP_BELOW_HINT
> > > arm64: mm: Support MAP_BELOW_HINT
> > > powerpc: mm: Support MAP_BELOW_HINT
> > > x86: mm: Support MAP_BELOW_HINT
> > > loongarch: mm: Support MAP_BELOW_HINT
> > > arm: mm: Support MAP_BELOW_HINT
> > > mips: mm: Support MAP_BELOW_HINT
> > > parisc: mm: Support MAP_BELOW_HINT
> > > s390: mm: Support MAP_BELOW_HINT
> > > sh: mm: Support MAP_BELOW_HINT
> > > sparc: mm: Support MAP_BELOW_HINT
> > > selftests/mm: Create MAP_BELOW_HINT test
> > >
> > > arch/arm/mm/mmap.c | 10 ++++++++
> > > arch/arm64/include/asm/processor.h | 34 ++++++++++++++++++++++----
> > > arch/loongarch/mm/mmap.c | 11 +++++++++
> > > arch/mips/mm/mmap.c | 9 +++++++
> > > arch/parisc/include/uapi/asm/mman.h | 1 +
> > > arch/parisc/kernel/sys_parisc.c | 9 +++++++
> > > arch/powerpc/include/asm/task_size_64.h | 36 +++++++++++++++++++++++-----
> > > arch/riscv/include/asm/processor.h | 32 -------------------------
> > > arch/s390/mm/mmap.c | 10 ++++++++
> > > arch/sh/mm/mmap.c | 10 ++++++++
> > > arch/sparc/kernel/sys_sparc_64.c | 8 +++++++
> > > arch/x86/kernel/sys_x86_64.c | 25 ++++++++++++++++---
> > > fs/hugetlbfs/inode.c | 2 +-
> > > include/linux/sched/mm.h | 34 ++++++++++++++++++++++++--
> > > include/uapi/asm-generic/mman-common.h | 1 +
> > > mm/mmap.c | 2 +-
> > > tools/arch/parisc/include/uapi/asm/mman.h | 1 +
> > > tools/include/uapi/asm-generic/mman-common.h | 1 +
> > > tools/testing/selftests/mm/Makefile | 1 +
> > > tools/testing/selftests/mm/map_below_hint.c | 29 ++++++++++++++++++++++
> > > 20 files changed, 216 insertions(+), 50 deletions(-)
> > > ---
> > > base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
> > > change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55
> > > --
> > > - Charlie
> > >
More information about the linux-riscv
mailing list