[PATCH] arm64: module: Widen module region to 2 GiB

Ard Biesheuvel ardb at kernel.org
Tue Apr 4 09:07:04 PDT 2023


On Tue, 4 Apr 2023 at 17:49, Mark Rutland <mark.rutland at arm.com> wrote:
>
> On Tue, Apr 04, 2023 at 03:54:37PM +0200, Ard Biesheuvel wrote:
> > Shanker reports that, after loading a 110+ MiB kernel module, no other
> > modules can be loaded, in spite of the fact that module PLTs have been
> > enabled in the build.
> >
> > This is due to the fact, even with module PLTs enabled, the module
> > region is dimensioned to be a fixed 128 MiB region, as we simply never
> > anticipated the need for supporting modules that huge.
> >
> > So let's increase the size of the statically allocated module region to
> > 2 GiB, and update the module loading logic so that we prefer loading
> > modules in the vicinity of the kernel text, where relative branches can
> > be resolved without the need for PLTs. Only when we run out of space
> > here (or when CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled), we will
> > fall back to the larger window and allocate from there.
> >
> > While at it, let's try to simplify the logic a bit, to make it easier to
> > follow:
> > - remove the special cases for KASAN, which are no longer needed now
> >   that KASAN_VMALLOC is always enabled when KASAN is configured;
> > - instead of defining a global module base address, define a global
> >   module limit, and work our way down from it.
>
> I find this last change a bit confusing, and I think it'd be preferable to have
> explicit start and end variables; more on that below.
>
> >
> > Cc: Shanker Donthineni <sdonthineni at nvidia.com>
> > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> > ---
> >  Documentation/arm64/memory.rst  |  8 ++---
> >  arch/arm64/include/asm/memory.h |  2 +-
> >  arch/arm64/include/asm/module.h | 10 ++++--
> >  arch/arm64/kernel/kaslr.c       | 38 +++++++++++------------
> >  arch/arm64/kernel/module.c      | 54 ++++++++++++++++-----------------
> >  5 files changed, 59 insertions(+), 53 deletions(-)
> >
> > diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst
> > index 2a641ba7be3b717a..55a55f30eed8a6ce 100644
> > --- a/Documentation/arm64/memory.rst
> > +++ b/Documentation/arm64/memory.rst
> > @@ -33,8 +33,8 @@ AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
> >    0000000000000000   0000ffffffffffff         256TB          user
> >    ffff000000000000   ffff7fffffffffff         128TB          kernel logical memory map
> >   [ffff600000000000   ffff7fffffffffff]         32TB          [kasan shadow region]
> > -  ffff800000000000   ffff800007ffffff         128MB          modules
> > -  ffff800008000000   fffffbffefffffff         124TB          vmalloc
> > +  ffff800000000000   ffff80007fffffff           2GB          modules
> > +  ffff800080000000   fffffbffefffffff         124TB          vmalloc
> >    fffffbfff0000000   fffffbfffdffffff         224MB          fixed mappings (top down)
> >    fffffbfffe000000   fffffbfffe7fffff           8MB          [guard region]
> >    fffffbfffe800000   fffffbffff7fffff          16MB          PCI I/O space
> > @@ -50,8 +50,8 @@ AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):
> >    0000000000000000   000fffffffffffff           4PB          user
> >    fff0000000000000   ffff7fffffffffff          ~4PB          kernel logical memory map
> >   [fffd800000000000   ffff7fffffffffff]        512TB          [kasan shadow region]
> > -  ffff800000000000   ffff800007ffffff         128MB          modules
> > -  ffff800008000000   fffffbffefffffff         124TB          vmalloc
> > +  ffff800000000000   ffff80007fffffff           2GB          modules
> > +  ffff800080000000   fffffbffefffffff         124TB          vmalloc
> >    fffffbfff0000000   fffffbfffdffffff         224MB          fixed mappings (top down)
> >    fffffbfffe000000   fffffbfffe7fffff           8MB          [guard region]
> >    fffffbfffe800000   fffffbffff7fffff          16MB          PCI I/O space
> > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > index 78e5163836a0ab95..b58c3127323e16c8 100644
> > --- a/arch/arm64/include/asm/memory.h
> > +++ b/arch/arm64/include/asm/memory.h
> > @@ -46,7 +46,7 @@
> >  #define KIMAGE_VADDR         (MODULES_END)
> >  #define MODULES_END          (MODULES_VADDR + MODULES_VSIZE)
> >  #define MODULES_VADDR                (_PAGE_END(VA_BITS_MIN))
> > -#define MODULES_VSIZE                (SZ_128M)
> > +#define MODULES_VSIZE                (SZ_2G)
> >  #define VMEMMAP_START                (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
> >  #define VMEMMAP_END          (VMEMMAP_START + VMEMMAP_SIZE)
> >  #define PCI_IO_END           (VMEMMAP_START - SZ_8M)
> > diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
> > index 18734fed3bdd7609..98dae9f87b521f07 100644
> > --- a/arch/arm64/include/asm/module.h
> > +++ b/arch/arm64/include/asm/module.h
> > @@ -31,9 +31,15 @@ u64 module_emit_veneer_for_adrp(struct module *mod, Elf64_Shdr *sechdrs,
> >                               void *loc, u64 val);
> >
> >  #ifdef CONFIG_RANDOMIZE_BASE
> > -extern u64 module_alloc_base;
> > +extern u64 module_alloc_limit;
> >  #else
> > -#define module_alloc_base    ((u64)_etext - MODULES_VSIZE)
> > +#define module_alloc_limit   MODULE_REF_END
> > +#endif
> > +
> > +#ifdef CONFIG_ARM64_MODULE_PLTS
> > +#define MODULE_REF_END               ((u64)_end)
> > +#else
> > +#define MODULE_REF_END               ((u64)_etext)
> >  #endif
>
> I was initially a bit confused by this. I think it's a bit misleading for the
> !CONFIG_ARM64_MODULE_PLTS case, since modules can still have data references
> between _etext and _end, it's just that we're (hopefully) unlikely to have the
> text be <128M with >2G of subsequent data.
>

So the reason for this is that otherwise, with PLTs disabled, we lose
(_end - _etext) worth of module space for no good reason, and this
could potentially be a substantial chunk of space. However, when PLTs
are enabled, we cannot safely use _etext as the upper bound, as _etext
- SZ_2G may produce an address that is out of range for a PREL32
relocation. However, in that case, we can tolerate the waste, so we
can just use _end instead.

> I'd find this clearer if we could express the two constaints separately. e.g.
> have something like:
>
> #define MODULE_DATA_REF_END             ((u64)_end)
> #define MODULE_TEXT_REF_END             ((u64)_etext)
>
> That'd allow us to do something like the below, which I think would be clearer.
>
> u64 module_alloc_end;
> u64 module_alloc_base_noplt;
> u64 module_alloc_base_plt;
>
> /*
>  * Call this somewhere after choosing hte KASLR limits
>  */
> void module_limits_init(void)
> {
>         module_alloc_end = (u64)_stext;
>
>         /*
>          * 32-bit relative data references must always fall within 2G of the
>          * end of the kernel image.
>          */
>         module_alloc_base_plt = max(MODULE_VADDR, MODULE_DATA_REF_END - SZ_2G);
>
>         /*
>          * Direct branches must be within 128M of the end of the kernel text.
>          */
>         module_alloc_base_noplt = max(module_alloc_base_plt,
>                                       MODULE_TEXT_REF_END - SZ_128M);
> }
>

Currently, the randomization of the module region is independent from
the randomization of the kernel itself, and once we incorporate that
here, I don't think it will be any clearer tbh.


> void *module_alloc(unsigned long size)
> {
>         gfp_t gfp_mask = GFP_KERNEL;
>         void *p = NULL;
>
>         if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
>                 gfp_mask |= __GFP_NOWARN;
>
>         /*
>          * By default, prefer to place modules such that they don't require
>          * PLTs. With CONFIG_RANDOMIZE_MODULE_REGION_FULL=y we'll prefer to use
>          * the entire range possible with PLTs, to get as much entropy as possible.
>          */
>         if (!IS_ENABLED(CONFIG_RANDOMIZE_MODULE_REGION_FULL)) {
>                 p = __vmalloc_node_range(size, MODULE_ALIGN,
>                                          module_base_noplt, module_end,
>                                          gfp_mask, PAGE_KERNEL,
>                                          VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
>                                          __builtin_return_address(0));
>         }
>
>         if (!p && IS_ENABLED(CONFIG_MODULE_PLTS)) {
>                 p = __vmalloc_node_range(size, MODULE_ALIGN,
>                                          module_base_plt, module_end,
>                                          gfp_mask, PAGE_KERNEL,
>                                          VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
>                                          __builtin_return_address(0));
>         }
>
>         [... play with KASAN here ...]
> }
>
> Is that viable, or am I missing something?
>

It looks correct to me, but I'll defer to others to decide which is preferable.



More information about the linux-arm-kernel mailing list