[PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime

Ryan Roberts ryan.roberts at arm.com
Mon Sep 30 08:12:01 PDT 2024


On 30/09/2024 15:53, Ard Biesheuvel wrote:
> On Mon, 30 Sept 2024 at 16:36, Ryan Roberts <ryan.roberts at arm.com> wrote:
>>
>> Hi Ard,
>>
>> On 14/02/2024 12:29, Ard Biesheuvel wrote:
>>> From: Ard Biesheuvel <ardb at kernel.org>
>>>
>>> In order to support LPA2 on 16k pages in a way that permits non-LPA2
>>> systems to run the same kernel image, we have to be able to fall back to
>>> at most 48 bits of virtual addressing.
>>>
>>> Falling back to 48 bits would result in a level 0 with only 2 entries,
>>> which is suboptimal in terms of TLB utilization. So instead, let's fall
>>> back to 47 bits in that case. This means we need to be able to fold PUDs
>>> dynamically, similar to how we fold P4Ds for 48 bit virtual addressing
>>> on LPA2 with 4k pages.
>>>
>>> Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
>>
>> [...]
>>
>>>
>>> +#define pud_index(addr)              (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
>>> +
>>> +static inline pud_t *p4d_to_folded_pud(p4d_t *p4dp, unsigned long addr)
>>> +{
>>> +     return (pud_t *)PTR_ALIGN_DOWN(p4dp, PAGE_SIZE) + pud_index(addr);
>>> +}
>>> +
>>
>> I wonder if you could explain what this function (and its equivalents at other
>> levels) is doing? Why isn't it just returning p4dp cast to a (pud_t *)?
>>
> 
> Because the p4dp index is derived from a different part of the VA, so
> it points into the right page but at the wrong entry.

OK, yeah I think that's sunk in. TBH, the folding stuff melts my brain. Thanks
for the quick response. So the code is definitely correct, and it's needed
because the PxD_SHIFT and PTRS_PER_PxD are "wrong" for the real geometry.

> 
>> I'm working on a prototype for boot-time page size selection. For this, I'm
>> compile-time enabling all levels, then run-time folding the ones I don't need,
>> based on the selected page size and VA size.
>>
> 
> Nice!

Certainly in principle. Hoping to get an RFC out during October.

> 
>> I'm trying to reuse your run-time folding code, but I have a case where this
>> function is broken as written. Replacing with "return (pud_t *)p4dp;" resolves
>> the problem; If VA_BITS=48 and pagesize=64K, the pgd has 64 entries. p4dp is
>> pointing to the correct entry in the pgd already, but this code aligns back to
>> the start of the page, then adds pud_index(), which is wrong because
>> PTRS_PER_PUD != PTRS_PER_PGDIR. (In my case, these 2 macros are actually
>> boot-time selected values rather than compile-time constants).
>>
>> I think your code is probably correct and working around PTRS_PER_PXD being
>> compile-time constants for the non-folded case, but I can't quite convince myself.
>>
> 
> Is your p4dp pointing to the correct descriptor because you changed
> the runtime behavior of p4d_index() perhaps?

Yes; in my prototype, P4D_SHIFT and PTRS_PER_P4D are set at runtime and end up
the same as they would have been if pgtable-nop4d.h was included for
compile-time folding.




More information about the linux-arm-kernel mailing list