[PATCH 2/2] arm64: Extend early page table code to allow for larger kernels

Steve Capper steve.capper at arm.com
Tue Nov 21 08:14:17 PST 2017


On Tue, Nov 21, 2017 at 01:24:28PM +0000, Ard Biesheuvel wrote:
> On 21 November 2017 at 13:14, Steve Capper <steve.capper at arm.com> wrote:
> > On Tue, Nov 21, 2017 at 11:13:06AM +0000, Steve Capper wrote:
> >> On Mon, Nov 20, 2017 at 05:00:10PM +0000, Mark Rutland wrote:
> >> > Hi,
> >>
> >> Hi Mark,
> >>
> >> >
> >> > On Fri, Nov 17, 2017 at 11:41:43AM +0000, Steve Capper wrote:
> >> > > -#define SWAPPER_DIR_SIZE (SWAPPER_PGTABLE_LEVELS * PAGE_SIZE)
> >> > > +#define EARLY_PGDS(vstart, vend) ((vend >> PGDIR_SHIFT) - (vstart >> PGDIR_SHIFT) + 1)
> >> > > +
> >> > > +#if SWAPPER_PGTABLE_LEVELS > 3
> >> > > +#define EARLY_PUDS(vstart, vend) ((vend >> PUD_SHIFT) - (vstart >> PUD_SHIFT) + 1)
> >> > > +#else
> >> > > +#define EARLY_PUDS(vstart, vend) (0)
> >> > > +#endif
> >> > > +
> >> > > +#if SWAPPER_PGTABLE_LEVELS > 2
> >> > > +#define EARLY_PMDS(vstart, vend) ((vend >> PMD_SHIFT) - (vstart >> PMD_SHIFT) + 1)
> >> > > +#else
> >> > > +#define EARLY_PMDS(vstart, vend) (0)
> >> > > +#endif
> >> > > +
> >> > > +#define EARLY_PAGES(vstart, vend) ( 1                    /* PGDIR page */                                \
> >> > > +                 + EARLY_PGDS((vstart), (vend))  /* each PGDIR needs a next level page table */  \
> >> > > +                 + EARLY_PUDS((vstart), (vend))  /* each PUD needs a next level page table */    \
> >> > > +                 + EARLY_PMDS((vstart), (vend))) /* each PMD needs a next level page table */
> >> > > +#define SWAPPER_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR + TEXT_OFFSET, _end + SZ_2M))
> >> > >  #define IDMAP_DIR_SIZE           (IDMAP_PGTABLE_LEVELS * PAGE_SIZE)
> >> >
> >> > I'm currently struggling to convince myself as to whether 2M is
> >> > necessary/sufficient slack space for all configurations.
> >>
> >> Agreed, if the possible address range is changed outside these bounds
> >> then we need to extend them.
> >>
> >> >
> >> > At least w.r.t. Ard's comment, we'd need a bit more slack to allow KASLR
> >> > to cross PGD/PUD/PMD boundaries.
> >> >
> >> > For example with 3 levels of 64K pages, and a huge kernel that takes up
> >> > SZ_512M - SZ_2M - TEXT_OFFSET. That perfectly fits into 1 PGDIR, 1 PGD,
> >> > and 1 PMD. IIUC, the above would allocate us 3 pages for this case. If
> >> > the kernel were to be relocated such that it straddled two PGDs, we'd
> >> > need 2 PGDs and 2 PMDs, needing 5 pages total.
> >> >
> >> > I'm not sure that we have a problem if we don't relax KASLR.
> >> >
> >>
> >> The approach I've adopted is to compute which indices are required for
> >> PGDs, PUDs, PMDs to map the supplied address range, then count them.
> >> If we require two PGD entries then that means we need two pages
> >> containing PMD entries to be allocated.
> >>
> >> If we consider just PGDs, for example, the only way I am aware of the
> >> kernel straddling more PGDs than previously computed is for the mapping
> >> to begin before vstart or end after vend (or both).
> >>
> >> Should I refine the range specified in the SWAPPER_DIR_SIZE for the
> >> current KASLR? (I thought the random offset was < SZ_2M?)
> >>
> >
> > Ahh, I see KASLR offset has the bottom 21 bits masked out, not confined
> > to the 21 bottom bits :-).
> >
> > It may be possible to do something with (vend - vstart) I will have a
> > think about this.
> >
> 
> Hi Steve,
> 
> Please be aware that it is slightly more complicated than that.
> 
> The VA randomization offset chosen by the KASLR code is made up of two
> separate values:
> - the VA offset modulo 2 MB, which equals the PA offset modulo 2 MB,
> and is set by the EFI stub when it loads the kernel image into 1:1
> mapped memory
> - the VA offset in 2 MB increments, which is set by the kaslr_init
> call in head.S
> 
> The reason for this approach is that it allows randomization at 64 KB
> granularity without losing the ability to map the kernel using 2 MB
> block mappings or contiguous page mappings. On a 48-bit VA kernel,
> this gives us 30 bits of randomization.


Thanks Ard!

So if I've understood correctly, it is valid for there to exist a VA
KASLR offset K at runtime s.t.

 K mod 2^SHIFT != 0

For SHIFT = PGDIR_SHIFT, PUD_SHIFT and SWAPPER_TABLE_SHIFT.
(I need to correct my EARLY_PMDS macro to use SWAPPER_TABLE_SHIFT instead
of PMD_SHIFT).

I've managed to convince myself that this will mean at most one extra
page being needed for each strideable level to cover where we get unlucky
with KASLR. This will make the kernel image up to 3 pages larger with
KASLR enabled (but shouldn't affect runtime memory as this will be given
back).

If (vend - vstart) mod 2^SHIFT == 0, then KASLR cannot affect that
particular level, but we would need to map a much larger kernel for
that identity to be true.

So I'll remove the 2MB end offset from SWAPPER_DIR_SIZE and add in an
extra page to EARLY_P[GUM]DS when KASLR is enabled.

If I've understood things correctly, this should be safe with an updated
KASLR that can cross PMD/PUD boundaries.

Cheers,
-- 
Steve



More information about the linux-arm-kernel mailing list