[PATCH v2 10/10] ARM: p2v: reduce p2v alignment requirement to 2 MiB

Tue Sep 22 06:23:23 EDT 2020

On Tue, 22 Sep 2020 at 11:11, Linus Walleij <linus.walleij at linaro.org> wrote:
>
> On Mon, Sep 21, 2020 at 5:41 PM Ard Biesheuvel <ardb at kernel.org> wrote:
>
> > Update the p2v patching code so we can deal with displacements that are
> > not a multiple of 16 MiB but of 2 MiB, to prevent wasting of up to 14 MiB
> > of physical RAM when running on a platform where the start of memory is
> > not correctly aligned.
> >
> > For the ARM code path, this simply comes down to using two add/sub
> > instructions instead of one for the carryless version, and patching
> > each of them with the correct immediate depending on the rotation
> > field. For the LPAE calculation, it patches the MOVW instruction with
> > up to 12 bits of offset.
> >
> > For the Thumb2 code path, patching more than 11 bits off displacement
> > is somewhat cumbersome, and given that 11 bits produce a minimum
> > alignment of 2 MiB, which is also the granularity for LPAE block
> > mappings, it makes sense to stick to 2 MiB for the new p2v requirement.
> >
> > Suggested-by: Zhen Lei <thunder.leizhen at huawei.com>
> > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
>
> My understanding of what is going on is limited to the high
> level of things, and being able to do this is just a great thing
> so FWIW:
> Acked-by: Linus Walleij <linus.walleij at linaro.org>
>
> If you or Russell need more thorough review I can sit down
> and try to understand at the bit granularity what is going on
> but it requires a bunch of time. Just tell me if you need this.
>

Just to summarize the intent of this code: the ARM kernel's linear map
starts at PAGE_OFFSET, which maps to a physical address (PHYS_OFFSET)
that is platform specific, and is discovered at boot. Since we don't
want to slow down translations between physical and virtual addresses
by keeping the offset in a variable in memory, we implement this by
patching the code performing the translation, and putting the offset
between PAGE_OFFSET and the start of physical RAM directly into the
instruction opcodes.

Currently, we only patch up to 8 bits of offset, which gives us 4 GiB
>> 8 == 16 MiB of granularity, and so if the start of physical RAM is
not a multiple of 16 MiB, we have to round it up to the next multiple.
This wastes some physical RAM, since the memory you skipped will now
live below PAGE_OFFSET, making it inaccessible to the kernel.

By changing the patchable sequences and the patching logic to carry
more bits of offset, we can improve this: 11 bits gives us 4 GiB >> 11
== 2 MiB granularity, and so you never waste more than that amount by
rounding up the physical start of DRAM to the next multiple of 2 MiB.
(Note that 2 MiB granularity guarantees that the linear mapping can be
created efficiently, whereas less than 2 MiB may result in the linear
mapping needing another level of page tables)

This helps Zhen Lei's scenario, where the start of DRAM is known to be
occupied. It also helps EFI boot, which relies on the firmware's page
allocator to allocate space for the decompressed kernel as low as
possible. And if the KASLR patches ever land for 32-bit, it will give
us 3 more bits of randomization of the placement of the kernel inside
the linear region.