[PATCH RFC 02/17] arm64: place kernel in its own L0 page table entry
Mark Rutland
mark.rutland at arm.com
Mon Jun 12 04:14:54 PDT 2023
Hi Russell,
On Tue, May 30, 2023 at 03:04:40PM +0100, Russell King (Oracle) wrote:
> Kernel text replication needs to maintain separate per-node page
> tables for the kernel text. In order to do this without affecting
> other kernel memory mappings, placing the kernel such that it does
> not share a L0 page table entry with any other mapping is desirable.
>
> Prior to this commit, the layout without KASLR was:
>
> +----------+
> | vmalloc |
> +----------+
> | Kernel |
> +----------+ MODULES_END, VMALLOC_START, KIMAGE_VADDR =
> | Modules | MODULES_VADDR + MODULES_VSIZE
> +----------+ MODULES_VADDR = _PAGE_END(VA_BITS_MIN)
> | VA space |
> +----------+ 0
>
> This becomes:
>
> +----------+
> | vmalloc |
> +----------+ VMALLOC_START = MODULES_END + PGDIR_SIZE
> | Kernel |
> +----------+ MODULES_END, KIMAGE_VADDR = _PAGE_END(VA_BITS_MIN) + PGDIR_SIZE
> | Modules |
> +----------+ MODULES_VADDR = MODULES_END - MODULES_VSIZE
> | VA space |
> +----------+ 0
With KSASLR we may randomize the kernel and module space over a substantial
portion of the vmalloc range. Are you expecting that text replication is going
to restruct that range, or that we'd make it mutually exclusive with KASLR?
I also note that the L0 table could have as few as two entries (with 16K pages
and 4 levels). So either we'd need to also mess with an L1 table, or make text
replication mutually exclusive with such configurations.
> This assumes MODULES_VSIZE (128M) <= PGDIR_SIZE.
As a heads-up, we've just changed MODULES_VSIZE to be 2G in
https://lore.kernel.org/linux-arm-kernel/20230530110328.2213762-1-mark.rutland@arm.com/
.. which is queued in the arm64 for-next/module-alloc branch:
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/module-alloc
> One side effect of this change is that KIMAGE_VADDR's definition now
> includes PGDIR_SIZE (to leave room for the modules) but this is not
> defined when asm/memory.h is included. This means KIMAGE_VADDR can
> not be used in inline functions within this file, so we convert
> kaslr_offset() and kaslr_enabled() to be macros instead.
That series above also decoupled kaslr_enabled() from kaslr_offset(),
so we'd only need to change kaslr_offset().
>
> Signed-off-by: Russell King (Oracle) <rmk+kernel at armlinux.org.uk>
> ---
> arch/arm64/include/asm/memory.h | 26 ++++++++++----------------
> arch/arm64/include/asm/pgtable.h | 2 +-
> arch/arm64/mm/mmu.c | 3 ++-
> 3 files changed, 13 insertions(+), 18 deletions(-)
>
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index c735afdf639b..089f556b7387 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -43,9 +43,9 @@
> #define VA_BITS (CONFIG_ARM64_VA_BITS)
> #define _PAGE_OFFSET(va) (-(UL(1) << (va)))
> #define PAGE_OFFSET (_PAGE_OFFSET(VA_BITS))
> -#define KIMAGE_VADDR (MODULES_END)
> -#define MODULES_END (MODULES_VADDR + MODULES_VSIZE)
> -#define MODULES_VADDR (_PAGE_END(VA_BITS_MIN))
> +#define KIMAGE_VADDR (_PAGE_END(VA_BITS_MIN) + PGDIR_SIZE)
> +#define MODULES_END (KIMAGE_VADDR)
> +#define MODULES_VADDR (MODULES_END - MODULES_VSIZE)
> #define MODULES_VSIZE (SZ_128M)
> #define VMEMMAP_START (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
> #define VMEMMAP_END (VMEMMAP_START + VMEMMAP_SIZE)
> @@ -199,20 +199,14 @@ extern u64 kimage_vaddr;
> /* the offset between the kernel virtual and physical mappings */
> extern u64 kimage_voffset;
>
> -static inline unsigned long kaslr_offset(void)
> -{
> - return kimage_vaddr - KIMAGE_VADDR;
> -}
> +#define kaslr_offset() ((unsigned long)(kimage_vaddr - KIMAGE_VADDR))
>
> -static inline bool kaslr_enabled(void)
> -{
> - /*
> - * The KASLR offset modulo MIN_KIMG_ALIGN is taken from the physical
> - * placement of the image rather than from the seed, so a displacement
> - * of less than MIN_KIMG_ALIGN means that no seed was provided.
> - */
> - return kaslr_offset() >= MIN_KIMG_ALIGN;
> -}
> +/*
> + * The KASLR offset modulo MIN_KIMG_ALIGN is taken from the physical
> + * placement of the image rather than from the seed, so a displacement
> + * of less than MIN_KIMG_ALIGN means that no seed was provided.
> + */
> +#define kaslr_enabled() (kaslr_offset() >= MIN_KIMG_ALIGN)
>
> /*
> * Allow all memory at the discovery stage. We will clip it later.
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 0bd18de9fd97..cb526e69299d 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -21,7 +21,7 @@
> * VMALLOC_END: extends to the available space below vmemmap, PCI I/O space
> * and fixed mappings
> */
> -#define VMALLOC_START (MODULES_END)
> +#define VMALLOC_START (MODULES_END + PGDIR_SIZE)
> #define VMALLOC_END (VMEMMAP_START - SZ_256M)
>
> #define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 4829abe017e9..baf74d0c43c9 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -478,7 +478,8 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
> static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
> phys_addr_t size, pgprot_t prot)
> {
> - if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
> + if ((virt >= PAGE_END) && (virt < VMALLOC_START) &&
> + !is_kernel(virt)) {
> pr_warn("BUG: not updating mapping for %pa at 0x%016lx - outside kernel range\n",
> &phys, virt);
> return;
I think the existing conditions here aren't quite right, and have become bogus
over time, and I don't think that the is_kernel() check is necessary here.
Originally, back in commit:
c1cc1552616d0f35 ("arm64: MMU initialisation")
We had:
if (virt < VMALLOC_START) {
pr_warning("BUG: not creating mapping for 0x%016llx at 0x%016lx - outside kernel range\n",
phys, virt);
return;
}
... which checked that the VA range we were manipulating was in the TTBR1 VA
range, as at the time, VMALLOC_START happened to be the lowest TTBR1 address.
That didn't substantially change until commit:
14c127c957c1c607 ("arm64: mm: Flip kernel VA space")
... when the test was changed to:
if ((virt >= VA_START) && (virt < VMALLOC_START)) {
pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
&phys, virt);
return;
}
Note: in that commit, VA_START was actually the end of the linear map (which
was itself a the start of the TTBR1 address space), so this is just checking if
we're poking a small portion of the TTBR1 address space, rather than if we're
poking *outside* of the TTBR1 address space.
That doesn't make much sense, and I'm pretty sure that was a thinko rather than
an intentional change of semantic.
I "fixed" that without thinking in commit:
77ad4ce69321abbe ("arm64: memory: rename VA_START to PAGE_END")
... making that:
if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
&phys, virt);
return;
}
... but clearly it has lost the original semantic and doesn't make much sense.
I think the test should actually be something like:
/* Must be a TTBR1 address */
if (virt < PAGE_OFFSET ) {
...
}
... and then we won't randomly trip for kernel mappings if those fall between
the linear map and vmalloc range.
Thanks,
Mark.
More information about the linux-arm-kernel
mailing list