[PATCH] arm64: module: Widen module region to 2 GiB

Wed Apr 5 06:16:25 PDT 2023

On Wed, 5 Apr 2023 at 13:38, Mark Rutland <mark.rutland at arm.com> wrote:
>
> On Tue, Apr 04, 2023 at 06:07:04PM +0200, Ard Biesheuvel wrote:
> > On Tue, 4 Apr 2023 at 17:49, Mark Rutland <mark.rutland at arm.com> wrote:
> > >
> > > On Tue, Apr 04, 2023 at 03:54:37PM +0200, Ard Biesheuvel wrote:
...
> > > >  #ifdef CONFIG_RANDOMIZE_BASE
> > > > -extern u64 module_alloc_base;
> > > > +extern u64 module_alloc_limit;
> > > >  #else
> > > > -#define module_alloc_base    ((u64)_etext - MODULES_VSIZE)
> > > > +#define module_alloc_limit   MODULE_REF_END
> > > > +#endif
> > > > +
> > > > +#ifdef CONFIG_ARM64_MODULE_PLTS
> > > > +#define MODULE_REF_END               ((u64)_end)
> > > > +#else
> > > > +#define MODULE_REF_END               ((u64)_etext)
> > > >  #endif
> > >
> > > I was initially a bit confused by this. I think it's a bit misleading for the
> > > !CONFIG_ARM64_MODULE_PLTS case, since modules can still have data references
> > > between _etext and _end, it's just that we're (hopefully) unlikely to have the
> > > text be <128M with >2G of subsequent data.
> > >
> >
> > So the reason for this is that otherwise, with PLTs disabled, we lose
> > (_end - _etext) worth of module space for no good reason, and this
> > could potentially be a substantial chunk of space. However, when PLTs
> > are enabled, we cannot safely use _etext as the upper bound, as _etext
> > - SZ_2G may produce an address that is out of range for a PREL32
> > relocation.
>
> I understood those two constraints, which are:
>
> (1) For PREL32 references, the module must be within 2GiB of _end regardless of
>     whether we use PLTs.
>
> (2) For direct branches without PLTs, the module must be within 128MiB of
>     _etext.
>

To be pedantic, let's define it as

(1) for PREL32 references, the module region must be at most 2 GiB in
size and include the kernel range [_stext, _end), so that PREL32
references from any module to any other module or the kernel are
always within -/+ 2 GiB

(2) for CALL26 references, the module region must be at most 128 MiB
in size and include the kernel range [_stext, _etext) so that CALL26
references from any module to any other module or the kernel are
always within -/+ 128 MiB

> What I find confusing is having a single conditional MODULE_REF_END definition,
> as it implicitly relies on other properties being maintained elsewhere, when we
> try to allocate modules relative to this, and I don't think those always align
> as we'd prefer.
>
> Consider a config:
>
>         CONFIG_MODULE_PLTS=y
>         RANDOMIZE_BASE=n
>         CONFIG_RANDOMIZE_MODULE_REGION_FULL=n
>
> In that config, with your patch we'd have:
>
>         #define module_alloc_limit      MODULE_REF_END
>         #define MODULE_REF_END          ((u64)_end)
>
> In module alloc(), our first attempt at allocating the module would be:
>
>         p = __vmalloc_node_range(size, MODULE_ALIGN,
>                                  module_alloc_limit - SZ_128M,
>                                  module_alloc_limit, gfp_mask, PAGE_KERNEL,
>                                  VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
>                                  __builtin_return_address(0));
>
> In this case, module_alloc_limit is '(u64)_end', so if there's a signficiant
> quantity of data between _etext and _end we will fail to allocate in the
> preferred 128M region that avoids PLTs more often than necessary, before
> falling back to the 2G region that may require PLTs.
>
> That's not a functional problem since we'll fall back to using PLTs, but I
> don't think that's as intended.
>

The allocations occur bottom up, so we will fall back earlier than
strictly necessary, but only after exhausting a significant chunk of
the module region. I don't see that as a problem.

> Consider another config with:
>
>         CONFIG_MODULE_PLTS=n
>         RANDOMIZE_BASE=n
>         CONFIG_RANDOMIZE_MODULE_REGION_FULL=n
>
> In that config we'd have:
>
>         #define module_alloc_limit      MODULE_REF_END
>         #define MODULE_REF_END          ((u64)_etext)
>
> In module alloc(), our only attempt at allocating the module would be:
>
>         p = __vmalloc_node_range(size, MODULE_ALIGN,
>                                  module_alloc_limit - SZ_128M,
>                                  module_alloc_limit, gfp_mask, PAGE_KERNEL,
>                                  VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
>                                  __builtin_return_address(0));
>
> Say I've built an incredibly unlikely kernel with 64M of text and 2G-96M of
> data between _etext and _end. In this case, 'module_alloc_limit - SZ_128M'
> would be 32M below '_end - SZ_2G', so PREL32 relocations could be out-of-range.
>

I think that ~2 GiB kernel images have their own special set of
challenges, and this is probably not the nastiest one.

> > However, in that case, we can tolerate the waste, so we can just use _end
> > instead.
>
> I get the idea, but as above, I don't think that's always correct.
>
> > > I'd find this clearer if we could express the two constaints separately. e.g.
> > > have something like:
> > >
> > > #define MODULE_DATA_REF_END             ((u64)_end)
> > > #define MODULE_TEXT_REF_END             ((u64)_etext)
> > >
> > > That'd allow us to do something like the below, which I think would be clearer.
> > >
> > > u64 module_alloc_end;
> > > u64 module_alloc_base_noplt;
> > > u64 module_alloc_base_plt;
> > >
> > > /*
> > >  * Call this somewhere after choosing hte KASLR limits
> > >  */
> > > void module_limits_init(void)
> > > {
> > >         module_alloc_end = (u64)_stext;
> > >
> > >         /*
> > >          * 32-bit relative data references must always fall within 2G of the
> > >          * end of the kernel image.
> > >          */
> > >         module_alloc_base_plt = max(MODULE_VADDR, MODULE_DATA_REF_END - SZ_2G);
> > >
> > >         /*
> > >          * Direct branches must be within 128M of the end of the kernel text.
> > >          */
> > >         module_alloc_base_noplt = max(module_alloc_base_plt,
> > >                                       MODULE_TEXT_REF_END - SZ_128M);
> > > }
> > >
> >
> > Currently, the randomization of the module region is independent from
> > the randomization of the kernel itself, and once we incorporate that
> > here, I don't think it will be any clearer tbh.
>
> Ok; I've clearly missed that aspect, and I'll have to go page that in.
>

ok