[PATCH 2/2] arm64: mm: always map fixmap at page granularity

Ryan Roberts ryan.roberts at arm.com
Wed Mar 15 03:23:31 PDT 2023


On 14/03/2023 14:21, Mark Rutland wrote:
> Today the fixmap code largely maps elements at PAGE_SIZE granularity,
> but we special-case the FDT mapping such that it can be mapped with 2M
> block mappings when 4K pages are in use. The original rationale for this
> was simplicity, but it has some unfortunate side-effects, and
> complicates portions of the fixmap code (i.e. is not so simple after
> all).
> 
> The FDT can be up to 2M in size but is only required to have 8-byte
> alignment, and so it may straddle a 2M boundary. Thus when using 2M
> block mappings we may map up to 4M of memory surrounding the FDT. This
> is unfortunate as most of that memory will be unrelated to the FDT, and
> any pages which happen to share a 2M block with the FDT will by mapped
> with Normal Write-Back Cacheable attributes, which might not be what we
> want elsewhere (e.g. for carve-outs using Non-Cacheable attributes).
> 
> The logic to handle mapping the FDT with 2M blocks requires some special
> cases in the fixmap code, and ties it to the early page table
> configuration by virtue of the SWAPPER_TABLE_SHIFT and
> SWAPPER_BLOCK_SIZE constants used to determine the granularity used to
> map the FDT.
> 
> This patch simplifies the FDT logic and removes the unnecessary mappings
> of surrounding pages by always mapping the FDT at page granularity as
> with all other fixmap mappings. To do so we statically reserve multiple
> PTE tables to cover the fixmap VA range. Since the FDT can be at most
> 2M, for 4K pages we only need to allocate a single additional PTE table,
> and for 16K and 64K pages the existing single PTE table is sufficient.
> 
> The PTE table allocation scales with the number of slots reserved in the
> fixmap, and so this also makes it easier to add more fixmap entries if
> we require those in future.
> 
> Our VA layout means that the fixmap will always fall within a single PMD
> table (and consequently, within a single PUD/P4D/PGD entry), which we
> can verify at compile time with a static_assert(). With that assert a
> number of runtime warnings become impossible, and are removed.
> 
> I've boot-tested this patch with both 4K and 64K pages.

I recently got caught up in the fixmap's tentacles so your effort to improve it
it certainly welcomed by me! Review comments below.

Thanks,
Ryan


> 
> Signed-off-by: Mark Rutland <mark.rutland at arm.com>
> Cc: Anshuman Khandual <anshuman.khandual at arm.com>
> Cc: Ard Biesheuvel <ardb at kernel.org>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Ryan Roberts <ryan.roberts at arm.com>
> Cc: Will Deacon <will at kernel.org>
> ---
>  arch/arm64/include/asm/fixmap.h         |  14 +--
>  arch/arm64/include/asm/kernel-pgtable.h |   5 +-
>  arch/arm64/mm/fixmap.c                  | 148 +++++++++++-------------
>  3 files changed, 77 insertions(+), 90 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
> index c153f069e9c9..c59d433c1801 100644
> --- a/arch/arm64/include/asm/fixmap.h
> +++ b/arch/arm64/include/asm/fixmap.h
> @@ -36,17 +36,13 @@ enum fixed_addresses {
>  	FIX_HOLE,
>  
>  	/*
> -	 * Reserve a virtual window for the FDT that is 2 MB larger than the
> -	 * maximum supported size, and put it at the top of the fixmap region.
> -	 * The additional space ensures that any FDT that does not exceed
> -	 * MAX_FDT_SIZE can be mapped regardless of whether it crosses any
> -	 * 2 MB alignment boundaries.
> -	 *
> -	 * Keep this at the top so it remains 2 MB aligned.
> +	 * Reserve a virtual window for the FDT that is a page bigger than the
> +	 * maximum supported size. The additional space ensures that any FDT
> +	 * that does not exceed MAX_FDT_SIZE can be mapped regardless of
> +	 * whether it crosses any page boundary.
>  	 */
> -#define FIX_FDT_SIZE		(MAX_FDT_SIZE + SZ_2M)
>  	FIX_FDT_END,
> -	FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
> +	FIX_FDT = FIX_FDT_END + MAX_FDT_SIZE / PAGE_SIZE - 1,

I don't think this is consistent with your comment - needs a +1 for the extra
page? On a 4K system, FIX_FDT will be calculated as 512, but we need it to be
513 to deal with the case where the FDT is 2MB and not aligned to a page boundary.

>  
>  	FIX_EARLYCON_MEM_BASE,
>  	FIX_TEXT_POKE0,
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index fcd14197756f..186dd7f85b14 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -59,8 +59,11 @@
>  #define EARLY_KASLR	(0)
>  #endif
>  
> +#define SPAN_NR_ENTRIES(vstart, vend, shift) \
> +	((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1)
> +
>  #define EARLY_ENTRIES(vstart, vend, shift, add) \
> -	((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1 + add)
> +	(SPAN_NR_ENTRIES(vstart, vend, shift) + (add))
>  
>  #define EARLY_PGDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, PGDIR_SHIFT, add))
>  
> diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
> index 54e50552bfe3..d7a6a485f361 100644
> --- a/arch/arm64/mm/fixmap.c
> +++ b/arch/arm64/mm/fixmap.c
> @@ -16,34 +16,77 @@
>  #include <asm/pgalloc.h>
>  #include <asm/tlbflush.h>
>  
> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> +#define NR_BM_PTE_TABLES \
> +	SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PMD_SHIFT)
> +#define NR_BM_PMD_TABLES \
> +	SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PUD_SHIFT)

One of the bear traps I stepped in last week when I was wrestling with this is
that FIXADDR_START isn't actually the address of the first slot! It is
__end_of_permanent_fixed_addresses. But you then have all the FIX_BTMAP slots,
FIX_PTE/PMD/PUD/PGD before FIXADDR_START. So I think some refactoring is in
order to properly cover all the slots.

Honestly, the fact that the fixed_addresses enum is backwards is mind-bending
too. Any chance we can just define FIXADDR_START and FIXADDR_TOP to cover the
whole lot, then calculate the slot address as (FIXADDR_START + (<ENUM> <<
PAGE_SHIFT))? (plus reorder of start/end enum labels for FDT and BTMAP). Or if
we really care about reclaiming the virtual address space between
__end_of_permanent_fixed_addresses and __end_of_fixed_addresses, create a new
FIXADDR_BOOT_START or whatever.

> +
> +static_assert(NR_BM_PMD_TABLES == 1);
> +
> +#define __BM_TABLE_IDX(addr, shift) \
> +	(((addr) >> (shift)) - (FIXADDR_START >> (shift)))

Doesn't this run the risk of being negative for one of the slots below
FIXADDR_START? (e.g. FIX_PTE). I think with the standard configs today, those
boot-time-only slots all _happen_ to fall in the same PMD. But that wasn't the
case with my issue last week.

> +
> +#define BM_PTE_TABLE_IDX(addr)	__BM_TABLE_IDX(addr, PMD_SHIFT)
> +
> +static pte_t bm_pte[NR_BM_PTE_TABLES][PTRS_PER_PTE] __page_aligned_bss;
>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>  
> -static inline pud_t *fixmap_pud(unsigned long addr)
> +static inline pte_t *fixmap_pte(unsigned long addr)
>  {
> -	pgd_t *pgdp = pgd_offset_k(addr);
> -	p4d_t *p4dp = p4d_offset(pgdp, addr);
> -	p4d_t p4d = READ_ONCE(*p4dp);
> +	return &bm_pte[BM_PTE_TABLE_IDX(addr)][pte_index(addr)];
> +}
>  
> -	BUG_ON(p4d_none(p4d) || p4d_bad(p4d));
> +void __init early_fixmap_init_pte(pmd_t *pmdp, unsigned long addr)

static? - same comment for pmd/pud below.

> +{
> +	pmd_t pmd = READ_ONCE(*pmdp);
> +	pte_t *ptep;
>  
> -	return pud_offset_kimg(p4dp, addr);
> +	if (pmd_none(pmd)) {
> +		ptep = bm_pte[BM_PTE_TABLE_IDX(addr)];;

nit: remove one of the semi-colons.

> +		__pmd_populate(pmdp, __pa_symbol(ptep), PMD_TYPE_TABLE);
> +	}
>  }
>  
> -static inline pmd_t *fixmap_pmd(unsigned long addr)
> +void __init early_fixmap_init_pmd(pud_t *pudp, unsigned long addr,
> +				  unsigned long end)
>  {
> -	pud_t *pudp = fixmap_pud(addr);
> +	unsigned long next;
>  	pud_t pud = READ_ONCE(*pudp);
> +	pmd_t *pmdp;
>  
> -	BUG_ON(pud_none(pud) || pud_bad(pud));
> +	if (pud_none(pud))
> +		__pud_populate(pudp, __pa_symbol(bm_pmd), PUD_TYPE_TABLE);
>  
> -	return pmd_offset_kimg(pudp, addr);
> +	pmdp = pmd_offset_kimg(pudp, addr);
> +	do {
> +		next = pmd_addr_end(addr, end);
> +		early_fixmap_init_pte(pmdp, addr);
> +	} while (pmdp++, addr = next, addr != end);
>  }
>  
> -static inline pte_t *fixmap_pte(unsigned long addr)
> +
> +void __init early_fixmap_init_pud(p4d_t *p4dp, unsigned long addr,
> +				  unsigned long end)
>  {
> -	return &bm_pte[pte_index(addr)];
> +	p4d_t p4d = READ_ONCE(*p4dp);
> +	pud_t *pudp;
> +
> +	if (CONFIG_PGTABLE_LEVELS > 3 && !p4d_none(p4d) &&
> +	    p4d_page_paddr(p4d) != __pa_symbol(bm_pud)) {
> +		/*
> +		 * We only end up here if the kernel mapping and the fixmap
> +		 * share the top level pgd entry, which should only happen on
> +		 * 16k/4 levels configurations.
> +		 */
> +		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> +	}
> +
> +	if (p4d_none(p4d))
> +		__p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
> +
> +	pudp = pud_offset_kimg(p4dp, addr);
> +	early_fixmap_init_pmd(pudp, addr, end);
>  }
>  
>  /*
> @@ -54,55 +97,13 @@ static inline pte_t *fixmap_pte(unsigned long addr)
>   */
>  void __init early_fixmap_init(void)
>  {
> -	pgd_t *pgdp;
> -	p4d_t *p4dp, p4d;
> -	pud_t *pudp;
> -	pmd_t *pmdp;
>  	unsigned long addr = FIXADDR_START;
> +	unsigned long end = FIXADDR_TOP;
>  
> -	pgdp = pgd_offset_k(addr);
> -	p4dp = p4d_offset(pgdp, addr);
> -	p4d = READ_ONCE(*p4dp);
> -	if (CONFIG_PGTABLE_LEVELS > 3 &&
> -	    !(p4d_none(p4d) || p4d_page_paddr(p4d) == __pa_symbol(bm_pud))) {
> -		/*
> -		 * We only end up here if the kernel mapping and the fixmap
> -		 * share the top level pgd entry, which should only happen on
> -		 * 16k/4 levels configurations.
> -		 */
> -		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> -		pudp = pud_offset_kimg(p4dp, addr);
> -	} else {
> -		if (p4d_none(p4d))
> -			__p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
> -		pudp = fixmap_pud(addr);
> -	}
> -	if (pud_none(READ_ONCE(*pudp)))
> -		__pud_populate(pudp, __pa_symbol(bm_pmd), PUD_TYPE_TABLE);
> -	pmdp = fixmap_pmd(addr);
> -	__pmd_populate(pmdp, __pa_symbol(bm_pte), PMD_TYPE_TABLE);
> +	pgd_t *pgdp = pgd_offset_k(addr);
> +	p4d_t *p4dp = p4d_offset(pgdp, addr);
>  
> -	/*
> -	 * The boot-ioremap range spans multiple pmds, for which
> -	 * we are not prepared:
> -	 */
> -	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
> -		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
> -
> -	if ((pmdp != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
> -	     || pmdp != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
> -		WARN_ON(1);
> -		pr_warn("pmdp %p != %p, %p\n",
> -			pmdp, fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)),
> -			fixmap_pmd(fix_to_virt(FIX_BTMAP_END)));
> -		pr_warn("fix_to_virt(FIX_BTMAP_BEGIN): %08lx\n",
> -			fix_to_virt(FIX_BTMAP_BEGIN));
> -		pr_warn("fix_to_virt(FIX_BTMAP_END):   %08lx\n",
> -			fix_to_virt(FIX_BTMAP_END));
> -
> -		pr_warn("FIX_BTMAP_END:       %d\n", FIX_BTMAP_END);
> -		pr_warn("FIX_BTMAP_BEGIN:     %d\n", FIX_BTMAP_BEGIN);
> -	}
> +	early_fixmap_init_pud(p4dp, addr, end);
>  }
>  
>  /*
> @@ -130,6 +131,7 @@ void __set_fixmap(enum fixed_addresses idx,
>  void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
>  {
>  	const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
> +	phys_addr_t dt_phys_base;
>  	int offset;
>  	void *dt_virt;
>  
> @@ -144,27 +146,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
>  	if (!dt_phys || dt_phys % MIN_FDT_ALIGN)
>  		return NULL;
>  
> -	/*
> -	 * Make sure that the FDT region can be mapped without the need to
> -	 * allocate additional translation table pages, so that it is safe
> -	 * to call create_mapping_noalloc() this early.
> -	 *
> -	 * On 64k pages, the FDT will be mapped using PTEs, so we need to
> -	 * be in the same PMD as the rest of the fixmap.
> -	 * On 4k pages, we'll use section mappings for the FDT so we only
> -	 * have to be in the same PUD.
> -	 */
> -	BUILD_BUG_ON(dt_virt_base % SZ_2M);
> -
> -	BUILD_BUG_ON(__fix_to_virt(FIX_FDT_END) >> SWAPPER_TABLE_SHIFT !=
> -		     __fix_to_virt(FIX_BTMAP_BEGIN) >> SWAPPER_TABLE_SHIFT);
> -
> -	offset = dt_phys % SWAPPER_BLOCK_SIZE;
> +	dt_phys_base = round_down(dt_phys, PAGE_SIZE);
> +	offset = dt_phys % PAGE_SIZE;
>  	dt_virt = (void *)dt_virt_base + offset;
>  
>  	/* map the first chunk so we can read the size from the header */
> -	create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE),
> -			dt_virt_base, SWAPPER_BLOCK_SIZE, prot);
> +	create_mapping_noalloc(dt_phys_base, dt_virt_base, PAGE_SIZE, prot);
>  
>  	if (fdt_magic(dt_virt) != FDT_MAGIC)
>  		return NULL;
> @@ -173,9 +160,10 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
>  	if (*size > MAX_FDT_SIZE)
>  		return NULL;
>  
> -	if (offset + *size > SWAPPER_BLOCK_SIZE)
> -		create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE), dt_virt_base,
> -			       round_up(offset + *size, SWAPPER_BLOCK_SIZE), prot);
> +	if (offset + *size > PAGE_SIZE) {
> +		create_mapping_noalloc(dt_phys_base, dt_virt_base,
> +				       offset + *size, prot);
> +	}
>  
>  	return dt_virt;
>  }




More information about the linux-arm-kernel mailing list