[RFC PATCH] arm64: mm: increase VA range of identity map

Catalin Marinas catalin.marinas at arm.com
Tue Feb 24 04:03:41 PST 2015


On Tue, Feb 24, 2015 at 10:55:42AM +0000, Ard Biesheuvel wrote:
> This patch fixes the annoying issue that AMD Seattle cannot boot the
> arm64 defconfig build, crashing so early that even earlycon is completely
> silent. It does so by allowing an ID map to have more translation levels
> than the default configured value.

We could have changed the defconfig to 4 levels of page tables but since
it's currently only needed for idmap, I like your approach.

> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 3e6859bc3e11..c2da529bb7bd 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1532,6 +1532,12 @@ int kvm_mmu_init(void)
>  			 (unsigned long)phys_base);
>  	}
>  
> +	if ((hyp_idmap_start >> PGDIR_SHIFT) >= PTRS_PER_PGD) {
> +		kvm_err("Couldn't identity map HYP init page (PA exceeds VA range)\n");
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +
>  	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
>  	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);

I wonder whether Hyp can share the same idmap as the rest of the kernel
(extending the latter to cover the hyp text). The Hyp one requires the
PTE_USER attributes but we could use the same attributes on the kernel
idmap since it's never used at the same time with a user process.

> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index f800d45ea226..19f6297a472b 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -42,12 +42,14 @@
>   * PAGE_OFFSET - the virtual address of the start of the kernel image (top
>   *		 (VA_BITS - 1))
>   * VA_BITS - the maximum number of bits for virtual addresses.
> + * MAX_VA_BITS - architectural max value for VA_BITS
>   * TASK_SIZE - the maximum size of a user space task.
>   * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area.
>   * The module space lives between the addresses given by TASK_SIZE
>   * and PAGE_OFFSET - it must be within 128MB of the kernel text.
>   */
>  #define VA_BITS			(CONFIG_ARM64_VA_BITS)
> +#define MAX_VA_BITS		48

Can we determine this dynamically based on the phys address? So on
hardware that doesn't need such large idmap, it would stick to the host
configured levels.

> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index a9eee33dfa62..641ce0574999 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -64,6 +64,44 @@ static inline void cpu_set_reserved_ttbr0(void)
>  	: "r" (ttbr));
>  }
>  
> +/*
> + * TCR.T0SZ value to use when the ID map is active. Usually equals
> + * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
> + * physical memory, in which case it will be smaller.
> + */
> +extern u64 idmap_t0sz;
> +
> +static inline void __cpu_set_tcr_t0sz(u64 t0sz)
> +{
> +	unsigned long tcr;
> +
> +	if (!IS_ENABLED(CONFIG_ARM64_VA_BITS_48)
> +	    && unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)))
> +		asm volatile(
> +		"	mrs	%0, tcr_el1		;"
> +		"	bfi	%0, %1, #%2, #%3	;"
> +		"	msr	tcr_el1, %0		;"
> +		"	isb"
> +		: "=&r" (tcr)
> +		: "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
> +}

We need some TLB invalidation here (or in the functions below) as the
TCR bits are allowed to be cached in the TLB. I think in general it
should be:

	cpu_set_reserved_ttbr0();
	flush_tlb_all();	/* any trace of previous T0SZ */
	cpu_set_(idmap|default)_tcr_t0sz();

> +/*
> + * Set TCR.T0SZ to the value appropriate for activating the identity map.
> + */
> +static inline void cpu_set_idmap_tcr_t0sz(void)
> +{
> +	__cpu_set_tcr_t0sz(idmap_t0sz);
> +}
> +
> +/*
> + * Set TCR.T0SZ to its default value (based on VA_BITS)
> + */
> +static inline void cpu_set_default_tcr_t0sz(void)
> +{
> +	__cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
> +}
> +

So flush_tlb_all() could be at the beginning of these functions.

> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 22b16232bd60..3d02b1869eb8 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -33,7 +33,9 @@
>   * image. Both require pgd, pud (4 levels only) and pmd tables to (section)
>   * map the kernel. With the 64K page configuration, swapper and idmap need to
>   * map to pte level. The swapper also maps the FDT (see __create_page_tables
> - * for more information).
> + * for more information). Note that the number of ID map translation levels
> + * could be increased on the fly if system RAM is out of reach for the default
> + * VA range, so 3 pages are reserved in all cases.
>   */
>  #ifdef CONFIG_ARM64_64K_PAGES
>  #define SWAPPER_PGTABLE_LEVELS	(CONFIG_ARM64_PGTABLE_LEVELS)
> @@ -42,7 +44,7 @@
>  #endif
>  
>  #define SWAPPER_DIR_SIZE	(SWAPPER_PGTABLE_LEVELS * PAGE_SIZE)
> -#define IDMAP_DIR_SIZE		(SWAPPER_DIR_SIZE)
> +#define IDMAP_DIR_SIZE		(3 * PAGE_SIZE)

That's fine, we make this permanent.

> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 8ce88e08c030..8e1778e7638e 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -387,6 +387,26 @@ __create_page_tables:
>  	mov	x0, x25				// idmap_pg_dir
>  	ldr	x3, =KERNEL_START
>  	add	x3, x3, x28			// __pa(KERNEL_START)
> +
> +#ifndef CONFIG_ARM64_VA_BITS_48
> +#define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
> +	/*
> +	 * If VA_BITS < 48, it may be too small to allow for an ID mapping to be
> +	 * created that covers system RAM if that is located sufficiently high
> +	 * in the physical address space. So for the ID map, use the entire
> +	 * available virtual range in that case.
> +	 */
> +	lsr	x5, x3, #VA_BITS
> +	cbz	x5, 1f
> +
> +	adrp	x6, idmap_t0sz
> +	mov	x5, #TCR_T0SZ(MAX_VA_BITS)
> +	str	x5, [x6, #:lo12:idmap_t0sz]

Could we use memstart_addr (it's probably available in x24 here) to
calculate the idmap_t0sz? We get get the ilog2(memstart_addr +
KERNEL_END - PAGE_OFFSET) using clz.

> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 328b8ce4b007..606005101020 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -150,6 +150,7 @@ asmlinkage void secondary_start_kernel(void)
>  	 * point to zero page to avoid speculatively fetching new entries.
>  	 */
>  	cpu_set_reserved_ttbr0();
> +	cpu_set_default_tcr_t0sz();
>  	flush_tlb_all();

If you avoid the flush_tlb_all() in cpu_set_default_tcr_t0sz(), you
should place the latter after the TLB invalidation.

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index c6daaf6c6f97..dffa1d05a101 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -40,6 +40,8 @@
>  
>  #include "mm.h"
>  
> +u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
> +
>  /*
>   * Empty_zero_page is a special page that is used for zero-initialized data
>   * and COW.
> @@ -453,6 +455,7 @@ void __init paging_init(void)
>  	 * point to zero page to avoid speculatively fetching new entries.
>  	 */
>  	cpu_set_reserved_ttbr0();
> +	cpu_set_default_tcr_t0sz();
>  	flush_tlb_all();
>  }

Same here.

> @@ -461,6 +464,8 @@ void __init paging_init(void)
>   */
>  void setup_mm_for_reboot(void)
>  {
> +	cpu_set_reserved_ttbr0();
> +	cpu_set_idmap_tcr_t0sz();
>  	cpu_switch_mm(idmap_pg_dir, &init_mm);
>  	flush_tlb_all();
>  }

And I think here we should move the flush_tlb_all() just after
cpu_set_reserved_ttbr0(). Since this would point to the zero page, we
won't get any new allocations, so I don't think we would need another
TLB invalidation after cpu_switch_mm().

-- 
Catalin



More information about the linux-arm-kernel mailing list