[PATCH 2/2] arm64: mm: always map fixmap at page granularity
Ryan Roberts
ryan.roberts at arm.com
Wed Mar 15 03:23:31 PDT 2023
On 14/03/2023 14:21, Mark Rutland wrote:
> Today the fixmap code largely maps elements at PAGE_SIZE granularity,
> but we special-case the FDT mapping such that it can be mapped with 2M
> block mappings when 4K pages are in use. The original rationale for this
> was simplicity, but it has some unfortunate side-effects, and
> complicates portions of the fixmap code (i.e. is not so simple after
> all).
>
> The FDT can be up to 2M in size but is only required to have 8-byte
> alignment, and so it may straddle a 2M boundary. Thus when using 2M
> block mappings we may map up to 4M of memory surrounding the FDT. This
> is unfortunate as most of that memory will be unrelated to the FDT, and
> any pages which happen to share a 2M block with the FDT will by mapped
> with Normal Write-Back Cacheable attributes, which might not be what we
> want elsewhere (e.g. for carve-outs using Non-Cacheable attributes).
>
> The logic to handle mapping the FDT with 2M blocks requires some special
> cases in the fixmap code, and ties it to the early page table
> configuration by virtue of the SWAPPER_TABLE_SHIFT and
> SWAPPER_BLOCK_SIZE constants used to determine the granularity used to
> map the FDT.
>
> This patch simplifies the FDT logic and removes the unnecessary mappings
> of surrounding pages by always mapping the FDT at page granularity as
> with all other fixmap mappings. To do so we statically reserve multiple
> PTE tables to cover the fixmap VA range. Since the FDT can be at most
> 2M, for 4K pages we only need to allocate a single additional PTE table,
> and for 16K and 64K pages the existing single PTE table is sufficient.
>
> The PTE table allocation scales with the number of slots reserved in the
> fixmap, and so this also makes it easier to add more fixmap entries if
> we require those in future.
>
> Our VA layout means that the fixmap will always fall within a single PMD
> table (and consequently, within a single PUD/P4D/PGD entry), which we
> can verify at compile time with a static_assert(). With that assert a
> number of runtime warnings become impossible, and are removed.
>
> I've boot-tested this patch with both 4K and 64K pages.
I recently got caught up in the fixmap's tentacles so your effort to improve it
it certainly welcomed by me! Review comments below.
Thanks,
Ryan
>
> Signed-off-by: Mark Rutland <mark.rutland at arm.com>
> Cc: Anshuman Khandual <anshuman.khandual at arm.com>
> Cc: Ard Biesheuvel <ardb at kernel.org>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Ryan Roberts <ryan.roberts at arm.com>
> Cc: Will Deacon <will at kernel.org>
> ---
> arch/arm64/include/asm/fixmap.h | 14 +--
> arch/arm64/include/asm/kernel-pgtable.h | 5 +-
> arch/arm64/mm/fixmap.c | 148 +++++++++++-------------
> 3 files changed, 77 insertions(+), 90 deletions(-)
>
> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
> index c153f069e9c9..c59d433c1801 100644
> --- a/arch/arm64/include/asm/fixmap.h
> +++ b/arch/arm64/include/asm/fixmap.h
> @@ -36,17 +36,13 @@ enum fixed_addresses {
> FIX_HOLE,
>
> /*
> - * Reserve a virtual window for the FDT that is 2 MB larger than the
> - * maximum supported size, and put it at the top of the fixmap region.
> - * The additional space ensures that any FDT that does not exceed
> - * MAX_FDT_SIZE can be mapped regardless of whether it crosses any
> - * 2 MB alignment boundaries.
> - *
> - * Keep this at the top so it remains 2 MB aligned.
> + * Reserve a virtual window for the FDT that is a page bigger than the
> + * maximum supported size. The additional space ensures that any FDT
> + * that does not exceed MAX_FDT_SIZE can be mapped regardless of
> + * whether it crosses any page boundary.
> */
> -#define FIX_FDT_SIZE (MAX_FDT_SIZE + SZ_2M)
> FIX_FDT_END,
> - FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
> + FIX_FDT = FIX_FDT_END + MAX_FDT_SIZE / PAGE_SIZE - 1,
I don't think this is consistent with your comment - needs a +1 for the extra
page? On a 4K system, FIX_FDT will be calculated as 512, but we need it to be
513 to deal with the case where the FDT is 2MB and not aligned to a page boundary.
>
> FIX_EARLYCON_MEM_BASE,
> FIX_TEXT_POKE0,
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index fcd14197756f..186dd7f85b14 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -59,8 +59,11 @@
> #define EARLY_KASLR (0)
> #endif
>
> +#define SPAN_NR_ENTRIES(vstart, vend, shift) \
> + ((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1)
> +
> #define EARLY_ENTRIES(vstart, vend, shift, add) \
> - ((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1 + add)
> + (SPAN_NR_ENTRIES(vstart, vend, shift) + (add))
>
> #define EARLY_PGDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, PGDIR_SHIFT, add))
>
> diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
> index 54e50552bfe3..d7a6a485f361 100644
> --- a/arch/arm64/mm/fixmap.c
> +++ b/arch/arm64/mm/fixmap.c
> @@ -16,34 +16,77 @@
> #include <asm/pgalloc.h>
> #include <asm/tlbflush.h>
>
> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> +#define NR_BM_PTE_TABLES \
> + SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PMD_SHIFT)
> +#define NR_BM_PMD_TABLES \
> + SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PUD_SHIFT)
One of the bear traps I stepped in last week when I was wrestling with this is
that FIXADDR_START isn't actually the address of the first slot! It is
__end_of_permanent_fixed_addresses. But you then have all the FIX_BTMAP slots,
FIX_PTE/PMD/PUD/PGD before FIXADDR_START. So I think some refactoring is in
order to properly cover all the slots.
Honestly, the fact that the fixed_addresses enum is backwards is mind-bending
too. Any chance we can just define FIXADDR_START and FIXADDR_TOP to cover the
whole lot, then calculate the slot address as (FIXADDR_START + (<ENUM> <<
PAGE_SHIFT))? (plus reorder of start/end enum labels for FDT and BTMAP). Or if
we really care about reclaiming the virtual address space between
__end_of_permanent_fixed_addresses and __end_of_fixed_addresses, create a new
FIXADDR_BOOT_START or whatever.
> +
> +static_assert(NR_BM_PMD_TABLES == 1);
> +
> +#define __BM_TABLE_IDX(addr, shift) \
> + (((addr) >> (shift)) - (FIXADDR_START >> (shift)))
Doesn't this run the risk of being negative for one of the slots below
FIXADDR_START? (e.g. FIX_PTE). I think with the standard configs today, those
boot-time-only slots all _happen_ to fall in the same PMD. But that wasn't the
case with my issue last week.
> +
> +#define BM_PTE_TABLE_IDX(addr) __BM_TABLE_IDX(addr, PMD_SHIFT)
> +
> +static pte_t bm_pte[NR_BM_PTE_TABLES][PTRS_PER_PTE] __page_aligned_bss;
> static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
> static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>
> -static inline pud_t *fixmap_pud(unsigned long addr)
> +static inline pte_t *fixmap_pte(unsigned long addr)
> {
> - pgd_t *pgdp = pgd_offset_k(addr);
> - p4d_t *p4dp = p4d_offset(pgdp, addr);
> - p4d_t p4d = READ_ONCE(*p4dp);
> + return &bm_pte[BM_PTE_TABLE_IDX(addr)][pte_index(addr)];
> +}
>
> - BUG_ON(p4d_none(p4d) || p4d_bad(p4d));
> +void __init early_fixmap_init_pte(pmd_t *pmdp, unsigned long addr)
static? - same comment for pmd/pud below.
> +{
> + pmd_t pmd = READ_ONCE(*pmdp);
> + pte_t *ptep;
>
> - return pud_offset_kimg(p4dp, addr);
> + if (pmd_none(pmd)) {
> + ptep = bm_pte[BM_PTE_TABLE_IDX(addr)];;
nit: remove one of the semi-colons.
> + __pmd_populate(pmdp, __pa_symbol(ptep), PMD_TYPE_TABLE);
> + }
> }
>
> -static inline pmd_t *fixmap_pmd(unsigned long addr)
> +void __init early_fixmap_init_pmd(pud_t *pudp, unsigned long addr,
> + unsigned long end)
> {
> - pud_t *pudp = fixmap_pud(addr);
> + unsigned long next;
> pud_t pud = READ_ONCE(*pudp);
> + pmd_t *pmdp;
>
> - BUG_ON(pud_none(pud) || pud_bad(pud));
> + if (pud_none(pud))
> + __pud_populate(pudp, __pa_symbol(bm_pmd), PUD_TYPE_TABLE);
>
> - return pmd_offset_kimg(pudp, addr);
> + pmdp = pmd_offset_kimg(pudp, addr);
> + do {
> + next = pmd_addr_end(addr, end);
> + early_fixmap_init_pte(pmdp, addr);
> + } while (pmdp++, addr = next, addr != end);
> }
>
> -static inline pte_t *fixmap_pte(unsigned long addr)
> +
> +void __init early_fixmap_init_pud(p4d_t *p4dp, unsigned long addr,
> + unsigned long end)
> {
> - return &bm_pte[pte_index(addr)];
> + p4d_t p4d = READ_ONCE(*p4dp);
> + pud_t *pudp;
> +
> + if (CONFIG_PGTABLE_LEVELS > 3 && !p4d_none(p4d) &&
> + p4d_page_paddr(p4d) != __pa_symbol(bm_pud)) {
> + /*
> + * We only end up here if the kernel mapping and the fixmap
> + * share the top level pgd entry, which should only happen on
> + * 16k/4 levels configurations.
> + */
> + BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> + }
> +
> + if (p4d_none(p4d))
> + __p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
> +
> + pudp = pud_offset_kimg(p4dp, addr);
> + early_fixmap_init_pmd(pudp, addr, end);
> }
>
> /*
> @@ -54,55 +97,13 @@ static inline pte_t *fixmap_pte(unsigned long addr)
> */
> void __init early_fixmap_init(void)
> {
> - pgd_t *pgdp;
> - p4d_t *p4dp, p4d;
> - pud_t *pudp;
> - pmd_t *pmdp;
> unsigned long addr = FIXADDR_START;
> + unsigned long end = FIXADDR_TOP;
>
> - pgdp = pgd_offset_k(addr);
> - p4dp = p4d_offset(pgdp, addr);
> - p4d = READ_ONCE(*p4dp);
> - if (CONFIG_PGTABLE_LEVELS > 3 &&
> - !(p4d_none(p4d) || p4d_page_paddr(p4d) == __pa_symbol(bm_pud))) {
> - /*
> - * We only end up here if the kernel mapping and the fixmap
> - * share the top level pgd entry, which should only happen on
> - * 16k/4 levels configurations.
> - */
> - BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> - pudp = pud_offset_kimg(p4dp, addr);
> - } else {
> - if (p4d_none(p4d))
> - __p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
> - pudp = fixmap_pud(addr);
> - }
> - if (pud_none(READ_ONCE(*pudp)))
> - __pud_populate(pudp, __pa_symbol(bm_pmd), PUD_TYPE_TABLE);
> - pmdp = fixmap_pmd(addr);
> - __pmd_populate(pmdp, __pa_symbol(bm_pte), PMD_TYPE_TABLE);
> + pgd_t *pgdp = pgd_offset_k(addr);
> + p4d_t *p4dp = p4d_offset(pgdp, addr);
>
> - /*
> - * The boot-ioremap range spans multiple pmds, for which
> - * we are not prepared:
> - */
> - BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
> - != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
> -
> - if ((pmdp != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
> - || pmdp != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
> - WARN_ON(1);
> - pr_warn("pmdp %p != %p, %p\n",
> - pmdp, fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)),
> - fixmap_pmd(fix_to_virt(FIX_BTMAP_END)));
> - pr_warn("fix_to_virt(FIX_BTMAP_BEGIN): %08lx\n",
> - fix_to_virt(FIX_BTMAP_BEGIN));
> - pr_warn("fix_to_virt(FIX_BTMAP_END): %08lx\n",
> - fix_to_virt(FIX_BTMAP_END));
> -
> - pr_warn("FIX_BTMAP_END: %d\n", FIX_BTMAP_END);
> - pr_warn("FIX_BTMAP_BEGIN: %d\n", FIX_BTMAP_BEGIN);
> - }
> + early_fixmap_init_pud(p4dp, addr, end);
> }
>
> /*
> @@ -130,6 +131,7 @@ void __set_fixmap(enum fixed_addresses idx,
> void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
> {
> const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
> + phys_addr_t dt_phys_base;
> int offset;
> void *dt_virt;
>
> @@ -144,27 +146,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
> if (!dt_phys || dt_phys % MIN_FDT_ALIGN)
> return NULL;
>
> - /*
> - * Make sure that the FDT region can be mapped without the need to
> - * allocate additional translation table pages, so that it is safe
> - * to call create_mapping_noalloc() this early.
> - *
> - * On 64k pages, the FDT will be mapped using PTEs, so we need to
> - * be in the same PMD as the rest of the fixmap.
> - * On 4k pages, we'll use section mappings for the FDT so we only
> - * have to be in the same PUD.
> - */
> - BUILD_BUG_ON(dt_virt_base % SZ_2M);
> -
> - BUILD_BUG_ON(__fix_to_virt(FIX_FDT_END) >> SWAPPER_TABLE_SHIFT !=
> - __fix_to_virt(FIX_BTMAP_BEGIN) >> SWAPPER_TABLE_SHIFT);
> -
> - offset = dt_phys % SWAPPER_BLOCK_SIZE;
> + dt_phys_base = round_down(dt_phys, PAGE_SIZE);
> + offset = dt_phys % PAGE_SIZE;
> dt_virt = (void *)dt_virt_base + offset;
>
> /* map the first chunk so we can read the size from the header */
> - create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE),
> - dt_virt_base, SWAPPER_BLOCK_SIZE, prot);
> + create_mapping_noalloc(dt_phys_base, dt_virt_base, PAGE_SIZE, prot);
>
> if (fdt_magic(dt_virt) != FDT_MAGIC)
> return NULL;
> @@ -173,9 +160,10 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
> if (*size > MAX_FDT_SIZE)
> return NULL;
>
> - if (offset + *size > SWAPPER_BLOCK_SIZE)
> - create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE), dt_virt_base,
> - round_up(offset + *size, SWAPPER_BLOCK_SIZE), prot);
> + if (offset + *size > PAGE_SIZE) {
> + create_mapping_noalloc(dt_phys_base, dt_virt_base,
> + offset + *size, prot);
> + }
>
> return dt_virt;
> }
More information about the linux-arm-kernel
mailing list