[PATCH v3 07/11] arm64: fixmap: allow init before linear mapping is set up
Ard Biesheuvel
ard.biesheuvel at linaro.org
Tue Apr 14 04:02:13 PDT 2015
On 14 April 2015 at 12:47, Mark Rutland <mark.rutland at arm.com> wrote:
> Hi Ard,
>
> On Fri, Apr 10, 2015 at 02:53:51PM +0100, Ard Biesheuvel wrote:
>> This reworks early_ioremap_init() so it populates the various levels
>> of translation tables while taking the following into account:
>> - be prepared for any of the levels to have been populated already, as
>> this may be the case once we move the kernel text mapping out of the
>> linear mapping;
>> - don't rely on __va() to translate the physical address in a page table
>> entry to a virtual address, since this produces linear mapping addresses;
>> instead, use the fact that at any level, we know exactly which page in
>> swapper_pg_dir an entry could be pointing to if it points anywhere.
>
> Can we not use Catalin's PHYS_OFFSET swizzling trick instead? i.e.
>
> * Set PHYS_OFFSET so __va hits in the text mapping.
>
> * Create the fixmap entries.
>
> * Parse the DTB or UEFI memory map, figure out the real PHYS_OFFSET.
>
> * Create linear mapping for the initial tables.
>
> * Swap PHYS_OFFSET for the real version, and update init_mm->pgd to
> point at the linear map alias of the swapper pgd.
>
> It seemed like that would require less open-coding of table manipulation
> code, as we could use __va() early.
>
> Is there a limitation with that approach that I'm missing?
>
I didn't quite catch Catalin's suggestion to mean the above, but the
way you put it seems viable to me. I'll have a go and see how far I
get with it.
> Otherwise, comments below.
>
>> This allows us to defer the initialization of the linear mapping until
>> after we have figured out where our RAM resides in the physical address
>> space.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
>> ---
>> arch/arm64/include/asm/compiler.h | 2 +
>> arch/arm64/kernel/vmlinux.lds.S | 14 +++--
>> arch/arm64/mm/mmu.c | 117 +++++++++++++++++++++++++-------------
>> 3 files changed, 90 insertions(+), 43 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/compiler.h b/arch/arm64/include/asm/compiler.h
>> index ee35fd0f2236..dd342af63673 100644
>> --- a/arch/arm64/include/asm/compiler.h
>> +++ b/arch/arm64/include/asm/compiler.h
>> @@ -27,4 +27,6 @@
>> */
>> #define __asmeq(x, y) ".ifnc " x "," y " ; .err ; .endif\n\t"
>>
>> +#define __pgdir __attribute__((section(".pgdir"),aligned(PAGE_SIZE)))
>> +
>> #endif /* __ASM_COMPILER_H */
>> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
>> index 98073332e2d0..604f285d3832 100644
>> --- a/arch/arm64/kernel/vmlinux.lds.S
>> +++ b/arch/arm64/kernel/vmlinux.lds.S
>> @@ -160,11 +160,15 @@ SECTIONS
>>
>> BSS_SECTION(0, 0, 0)
>>
>> - . = ALIGN(PAGE_SIZE);
>> - idmap_pg_dir = .;
>> - . += IDMAP_DIR_SIZE;
>> - swapper_pg_dir = .;
>> - . += SWAPPER_DIR_SIZE;
>> + .pgdir (NOLOAD) : {
>> + . = ALIGN(PAGE_SIZE);
>> + idmap_pg_dir = .;
>> + . += IDMAP_DIR_SIZE;
>> + swapper_pg_dir = .;
>> + __swapper_bs_region = . + PAGE_SIZE;
>> + . += SWAPPER_DIR_SIZE;
>> + *(.pgdir)
>> + }
>>
>> _end = .;
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 60be58a160a2..c0427b5c90c7 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -341,6 +341,70 @@ static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
>> }
>> #endif
>>
>> +struct mem_bootstrap_region {
>
> The "region" naming is a little confusing IMO. To me it sounds like
> something akin to a VMA rather than a set of tables.
>
ok
>> +#if CONFIG_ARM64_PGTABLE_LEVELS > 3
>> + pud_t pud[PTRS_PER_PUD];
>> +#endif
>> +#if CONFIG_ARM64_PGTABLE_LEVELS > 2
>> + pmd_t pmd[PTRS_PER_PMD];
>> +#endif
>> + pte_t pte[PTRS_PER_PTE];
>> +};
>> +
>> +static void __init bootstrap_mem_region(unsigned long addr,
>> + struct mem_bootstrap_region *reg,
>> + pmd_t **ppmd, pte_t **ppte)
>> +{
>> + /*
>> + * Avoid using the linear phys-to-virt translation __va() so that we
>> + * can use this code before the linear mapping is set up. Note that
>> + * any populated entries at any level can only point into swapper_pg_dir
>> + * since no other translation table pages have been allocated yet.
>> + * So at each level, we either need to populate it, or it has already
>> + * been populated by a swapper_pg_dir table at the same level, in which
>> + * case we can figure out its virtual address without applying __va()
>> + * on the contents of the entry, using the following struct.
>> + */
>> + extern struct mem_bootstrap_region __swapper_bs_region;
>> +
>> + pgd_t *pgd = pgd_offset_k(addr);
>> + pud_t *pud = (pud_t *)pgd;
>> + pmd_t *pmd = (pmd_t *)pud;
>> +
>> +#if CONFIG_ARM64_PGTABLE_LEVELS > 3
>> + if (pgd_none(*pgd)) {
>> + clear_page(reg->pud);
>> + pgd_populate(&init_mm, pgd, reg->pud);
>
> What's PHYS_OFFSET expected to be at this point (for the purposes of
> __pa() in the *_populate*() macros)?
>
> If we're relying on __pa() to convert text->phys, won't __va() convert
> phys->text at this point?
>
At this points, yes. But later on, when the kernel text moves out of
the linear region, __pa() takes into account whether the input VA is
above or below PAGE_OFFSET, and adds the kernel image offset in the
latter case.
Changing __va() so it implements the inverse would be a can of worms
i'd rather keep closed.
>> + pud = reg->pud;
>> + } else {
>> + pud = __swapper_bs_region.pud;
>> + }
>> + pud += pud_index(addr);
>> +#endif
>
> Can we free the unused reg tables in the else cases? If __pa() works we
> should be able to hand them to memblock, no?
>
Only if we put the memblock_reserve() of the kernel image before
early_fixmap_init(), otherwise we are freeing only to reserve it again
later.
> [...]
>
>> /*
>> * The boot-ioremap range spans multiple pmds, for which
>> - * we are not preparted:
>> + * we are not prepared:
>
> This typo has been bugging me for ages. I'll be glad to see it go!
>
:-)
More information about the linux-arm-kernel
mailing list