[PATCH 6/6] arm64: module: rework module VA range selection
Mark Rutland
mark.rutland at arm.com
Tue May 9 07:00:19 PDT 2023
On Tue, May 09, 2023 at 01:40:12PM +0200, Ard Biesheuvel wrote:
> Hi Mark,
>
> Thanks for cleaning this up.
>
> On Tue, 9 May 2023 at 13:15, Mark Rutland <mark.rutland at arm.com> wrote:
> >
> > Currently, the modules region is 128M in size, which is a problem for
> > some large modules. Shanker reports [1] that the NVIDIA GPU driver alone
> > can consume 110M of module space in some configurations. We'd like to
> > make the modules region a full 2G such that we can always make use of a
> > 2G range.
> >
> > It's possible to build kernel images which are larger than 128M in some
> > configurations, such as when many debug options are selected and many
> > drivers are built in. In these configurations, we can't legitimately
> > select a base for a 128M module region, though we currently select a
> > value for which allocation will fail. It would be nicer to have a
> > diagnostic message in this case.
> >
> > Similarly, in theory it's possible to build a kernel image which is
> > larger than 2G and which cannot support modules. While this isn't likely
> > to be the case for any realistic kernel deplyed in the field, it would
> > be nice if we could print a diagnostic in this case.
> >
> > This patch reworks the module VA renage selection to use a 2G range, and
> > improves handling of cases where we cannot select legitimate module
> > regions. We now attempt to select a 128M region and a 2G region:
> >
> > * The 128M region is selected such that modules can use direct branches
> > (with JUMP26/CALL26 relocations) to branch to kernel code and other
> > modules, and so that modules can use direct data references (with
> > PREL32 relocations) to access data in the kernel image and other
> > modules.
> >
> > This region covers the entire kernel image (rather than just the text)
> > to ensure that all PREL32 relocations are in range even the kernel
> > data section is absurdly large. Where we cannot allocate from this
> > region, we'll fall back to the full 2G region.
> >
> > * The 2G region is selected such that modules can use direct branches
> > with PLTS to branch to kernel code and other modules, and so that
> > modules can use direct data references (with PREL32 relocations) to
> > access data in the kernel image and other modules.
> >
> > This region covers the entire kernel image, and the 128M region (if
> > one is selected).
> >
> > The two module regions are randomized independently while ensuring the
> > constraints described above.
> >
> > [1] https://lore.kernel.org/linux-arm-kernel/159ceeab-09af-3174-5058-445bc8dcf85b@nvidia.com/
[...]
> > +/*
> > + * Modules may directly reference data anywhere within the kernel image and
> > + * other modules. These data references will use PREL32 relocations with a
> > + * +/-2G range, and so we need to ensure that the entire kernel image and all
> > + * modules fall within a 2G window such that these are always within range.
> > + *
>
> 'Data references' is slightly inaccurate here - data references from
> code use ADRP/LDR with a -/+ 4G range, whereas the PREL32 references
> in question are references *from* data to both data and code symbols.
>
> The conclusion is the same of course, PREL32 having the smaller range
> and needing to cover the entire kernel image, including code symbols
> living in .text
Indeed; I'll replace the above with:
/*
* Modules may directly refrence data and text anywhere within the kernel image
* and other modules. References using PREL32 relocations have a +/-2G range,
* and so we need to ensure that the entire kernel image and all modules fall
* within a 2G winfow such that these are always within range.
*/
... and I'll update the commit message similarly where it refers to PREL32
relocations.
[...]
> > + if (kernel_size >= SZ_2G) {
> > + pr_warn("Kernel is too large to support modules (%llu bytes)\n",
> > + kernel_size);
> > + return 0;
> > + }
> >
> > if (IS_ENABLED(CONFIG_RANDOMIZE_MODULE_REGION_FULL)) {
> > - /*
> > - * Randomize the module region over a 2 GB window covering the
> > - * kernel. This reduces the risk of modules leaking information
> > - * about the address of the kernel itself, but results in
> > - * branches between modules and the core kernel that are
> > - * resolved via PLTs. (Branches between modules will be
> > - * resolved normally.)
> > - */
> > - module_range = SZ_2G - (u64)(_end - _stext);
> > - module_alloc_base = max((u64)_end - SZ_2G, (u64)MODULES_VADDR);
> > + pr_info("2G module region forced by RANDOMIZE_MODULE_REGION_FULL\n");
> > + } else if (kernel_size >= SZ_128M) {
>
> I suppose this bound is somewhat arbitrary? I mean, if kernel_size
> were SZ_128M-SZ_4K, we'd have the exact same problem, and end up using
> the 2G region all the same, just with a different diagnostic message?
That's a fair point, and that's also true for the 2G boundary.
Since the useful bound is arbitrary, it's probably better to log how many pages
we could potentially use.
I'll have a go at doing that instead.
> > + pr_info("2G module region forced by kernel size (%llu bytes)\n",
> > + kernel_size);
> > + } else if (IS_ENABLED(CONFIG_RANOMIZE_BASE)) {
>
> Typo here ^^^
Thanks; I've fixed that now and I'll go re-test...
Mark.
More information about the linux-arm-kernel
mailing list