[PATCH] arm64: module: Widen module region to 2 GiB
Ard Biesheuvel
ardb at kernel.org
Tue Apr 4 09:07:04 PDT 2023
On Tue, 4 Apr 2023 at 17:49, Mark Rutland <mark.rutland at arm.com> wrote:
>
> On Tue, Apr 04, 2023 at 03:54:37PM +0200, Ard Biesheuvel wrote:
> > Shanker reports that, after loading a 110+ MiB kernel module, no other
> > modules can be loaded, in spite of the fact that module PLTs have been
> > enabled in the build.
> >
> > This is due to the fact, even with module PLTs enabled, the module
> > region is dimensioned to be a fixed 128 MiB region, as we simply never
> > anticipated the need for supporting modules that huge.
> >
> > So let's increase the size of the statically allocated module region to
> > 2 GiB, and update the module loading logic so that we prefer loading
> > modules in the vicinity of the kernel text, where relative branches can
> > be resolved without the need for PLTs. Only when we run out of space
> > here (or when CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled), we will
> > fall back to the larger window and allocate from there.
> >
> > While at it, let's try to simplify the logic a bit, to make it easier to
> > follow:
> > - remove the special cases for KASAN, which are no longer needed now
> > that KASAN_VMALLOC is always enabled when KASAN is configured;
> > - instead of defining a global module base address, define a global
> > module limit, and work our way down from it.
>
> I find this last change a bit confusing, and I think it'd be preferable to have
> explicit start and end variables; more on that below.
>
> >
> > Cc: Shanker Donthineni <sdonthineni at nvidia.com>
> > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> > ---
> > Documentation/arm64/memory.rst | 8 ++---
> > arch/arm64/include/asm/memory.h | 2 +-
> > arch/arm64/include/asm/module.h | 10 ++++--
> > arch/arm64/kernel/kaslr.c | 38 +++++++++++------------
> > arch/arm64/kernel/module.c | 54 ++++++++++++++++-----------------
> > 5 files changed, 59 insertions(+), 53 deletions(-)
> >
> > diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst
> > index 2a641ba7be3b717a..55a55f30eed8a6ce 100644
> > --- a/Documentation/arm64/memory.rst
> > +++ b/Documentation/arm64/memory.rst
> > @@ -33,8 +33,8 @@ AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
> > 0000000000000000 0000ffffffffffff 256TB user
> > ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map
> > [ffff600000000000 ffff7fffffffffff] 32TB [kasan shadow region]
> > - ffff800000000000 ffff800007ffffff 128MB modules
> > - ffff800008000000 fffffbffefffffff 124TB vmalloc
> > + ffff800000000000 ffff80007fffffff 2GB modules
> > + ffff800080000000 fffffbffefffffff 124TB vmalloc
> > fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down)
> > fffffbfffe000000 fffffbfffe7fffff 8MB [guard region]
> > fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space
> > @@ -50,8 +50,8 @@ AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):
> > 0000000000000000 000fffffffffffff 4PB user
> > fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map
> > [fffd800000000000 ffff7fffffffffff] 512TB [kasan shadow region]
> > - ffff800000000000 ffff800007ffffff 128MB modules
> > - ffff800008000000 fffffbffefffffff 124TB vmalloc
> > + ffff800000000000 ffff80007fffffff 2GB modules
> > + ffff800080000000 fffffbffefffffff 124TB vmalloc
> > fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down)
> > fffffbfffe000000 fffffbfffe7fffff 8MB [guard region]
> > fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space
> > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > index 78e5163836a0ab95..b58c3127323e16c8 100644
> > --- a/arch/arm64/include/asm/memory.h
> > +++ b/arch/arm64/include/asm/memory.h
> > @@ -46,7 +46,7 @@
> > #define KIMAGE_VADDR (MODULES_END)
> > #define MODULES_END (MODULES_VADDR + MODULES_VSIZE)
> > #define MODULES_VADDR (_PAGE_END(VA_BITS_MIN))
> > -#define MODULES_VSIZE (SZ_128M)
> > +#define MODULES_VSIZE (SZ_2G)
> > #define VMEMMAP_START (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
> > #define VMEMMAP_END (VMEMMAP_START + VMEMMAP_SIZE)
> > #define PCI_IO_END (VMEMMAP_START - SZ_8M)
> > diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
> > index 18734fed3bdd7609..98dae9f87b521f07 100644
> > --- a/arch/arm64/include/asm/module.h
> > +++ b/arch/arm64/include/asm/module.h
> > @@ -31,9 +31,15 @@ u64 module_emit_veneer_for_adrp(struct module *mod, Elf64_Shdr *sechdrs,
> > void *loc, u64 val);
> >
> > #ifdef CONFIG_RANDOMIZE_BASE
> > -extern u64 module_alloc_base;
> > +extern u64 module_alloc_limit;
> > #else
> > -#define module_alloc_base ((u64)_etext - MODULES_VSIZE)
> > +#define module_alloc_limit MODULE_REF_END
> > +#endif
> > +
> > +#ifdef CONFIG_ARM64_MODULE_PLTS
> > +#define MODULE_REF_END ((u64)_end)
> > +#else
> > +#define MODULE_REF_END ((u64)_etext)
> > #endif
>
> I was initially a bit confused by this. I think it's a bit misleading for the
> !CONFIG_ARM64_MODULE_PLTS case, since modules can still have data references
> between _etext and _end, it's just that we're (hopefully) unlikely to have the
> text be <128M with >2G of subsequent data.
>
So the reason for this is that otherwise, with PLTs disabled, we lose
(_end - _etext) worth of module space for no good reason, and this
could potentially be a substantial chunk of space. However, when PLTs
are enabled, we cannot safely use _etext as the upper bound, as _etext
- SZ_2G may produce an address that is out of range for a PREL32
relocation. However, in that case, we can tolerate the waste, so we
can just use _end instead.
> I'd find this clearer if we could express the two constaints separately. e.g.
> have something like:
>
> #define MODULE_DATA_REF_END ((u64)_end)
> #define MODULE_TEXT_REF_END ((u64)_etext)
>
> That'd allow us to do something like the below, which I think would be clearer.
>
> u64 module_alloc_end;
> u64 module_alloc_base_noplt;
> u64 module_alloc_base_plt;
>
> /*
> * Call this somewhere after choosing hte KASLR limits
> */
> void module_limits_init(void)
> {
> module_alloc_end = (u64)_stext;
>
> /*
> * 32-bit relative data references must always fall within 2G of the
> * end of the kernel image.
> */
> module_alloc_base_plt = max(MODULE_VADDR, MODULE_DATA_REF_END - SZ_2G);
>
> /*
> * Direct branches must be within 128M of the end of the kernel text.
> */
> module_alloc_base_noplt = max(module_alloc_base_plt,
> MODULE_TEXT_REF_END - SZ_128M);
> }
>
Currently, the randomization of the module region is independent from
the randomization of the kernel itself, and once we incorporate that
here, I don't think it will be any clearer tbh.
> void *module_alloc(unsigned long size)
> {
> gfp_t gfp_mask = GFP_KERNEL;
> void *p = NULL;
>
> if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
> gfp_mask |= __GFP_NOWARN;
>
> /*
> * By default, prefer to place modules such that they don't require
> * PLTs. With CONFIG_RANDOMIZE_MODULE_REGION_FULL=y we'll prefer to use
> * the entire range possible with PLTs, to get as much entropy as possible.
> */
> if (!IS_ENABLED(CONFIG_RANDOMIZE_MODULE_REGION_FULL)) {
> p = __vmalloc_node_range(size, MODULE_ALIGN,
> module_base_noplt, module_end,
> gfp_mask, PAGE_KERNEL,
> VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
> __builtin_return_address(0));
> }
>
> if (!p && IS_ENABLED(CONFIG_MODULE_PLTS)) {
> p = __vmalloc_node_range(size, MODULE_ALIGN,
> module_base_plt, module_end,
> gfp_mask, PAGE_KERNEL,
> VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
> __builtin_return_address(0));
> }
>
> [... play with KASAN here ...]
> }
>
> Is that viable, or am I missing something?
>
It looks correct to me, but I'll defer to others to decide which is preferable.
More information about the linux-arm-kernel
mailing list