[PATCH v5 07/12] KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1

Fri Nov 24 03:49:57 PST 2023

On 22/11/2023 15:21, Marc Zyngier wrote:
> On Wed, 22 Nov 2023 13:41:33 +0000,
> Ryan Roberts <ryan.roberts at arm.com> wrote:
>>
>> On 21/11/2023 20:34, Oliver Upton wrote:
>>> On Thu, Nov 16, 2023 at 02:29:26PM +0000, Ryan Roberts wrote:
>>>> Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
>>>> page size we are using, always use LPA2-style page-tables for stage 2
>>>> and hyp stage 1 (assuming an nvhe hyp), regardless of the VMM-requested
>>>> IPA size or HW-implemented PA size. When in use we can now support up to
>>>> 52-bit IPA and PA sizes.
>>>>
>>>> We use the previously created cpu feature to track whether LPA2 is
>>>> supported for deciding whether to use the LPA2 or classic pte format.
>>>>
>>>> Note that FEAT_LPA2 brings support for bigger block mappings (512GB with
>>>> 4KB, 64GB with 16KB). We explicitly don't enable these in the library
>>>> because stage2_apply_range() works on batch sizes of the largest used
>>>> block mapping, and increasing the size of the batch would lead to soft
>>>> lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit
>>>> stage2_apply_range() batch size to largest block").
>>>>
>>>> With the addition of LPA2 support in the hypervisor, the PA size
>>>> supported by the HW must be capped with a runtime decision, rather than
>>>> simply using a compile-time decision based on PA_BITS. For example, on a
>>>> system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB
>>>> or 16KB kernel compiled with LPA2 support must still limit the PA size
>>>> to 48 bits.
>>>>
>>>> Therefore, move the insertion of the PS field into TCR_EL2 out of
>>>> __kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode()
>>>> where the rest of TCR_EL2 is prepared. This allows us to figure out PS
>>>> with kvm_get_parange(), which has the appropriate logic to ensure the
>>>> above requirement. (and the PS field of VTCR_EL2 is already populated
>>>> this way).
>>>>
>>>> Signed-off-by: Ryan Roberts <ryan.roberts at arm.com>
>>>> ---
>>>>  arch/arm64/include/asm/kvm_mmu.h     |  2 +-
>>>>  arch/arm64/include/asm/kvm_pgtable.h | 47 +++++++++++++++++++++-------
>>>>  arch/arm64/kvm/arm.c                 |  5 +++
>>>>  arch/arm64/kvm/hyp/nvhe/hyp-init.S   |  4 ---
>>>>  arch/arm64/kvm/hyp/pgtable.c         | 15 +++++++--
>>>>  5 files changed, 54 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>>>> index 31e8d7faed65..f4e4fcb35afc 100644
>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>> @@ -340,7 +340,7 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
>>>>  	return container_of(mmu->arch, struct kvm, arch);
>>>>  }
>>>>  
>>>> -#define kvm_lpa2_is_enabled()		false
>>>> +#define kvm_lpa2_is_enabled()		system_supports_lpa2()
>>>
>>> Can we use this predicate consistently throughout the KVM code? Looks
>>> like the rest of this diff is using system_supports_lpa2() directly.
>>
>> My thinking was that system_supports_lpa2() is an input to KVM's policy to
>> decide if it is going to use LPA2 (currently that policy is very simple - if the
>> system supports it, then KVM uses it - but it doesn't have to be that way), and
>> kvm_lpa2_is_enabled() is how KVM exports its policy decision, so one is an input
>> and the other is an output.
>>
>> It's a lightly held opinion though - I'll make the change if you insist? :)
> 
> <bikeshed>
> I personally don't find this dichotomy very useful. It could make
> sense if we used the page table walker for S1 outside of KVM, but
> that's not the case at the moment.
> 
> If there is no plan for such a use case, I'd rather see a single
> predicate, making the code a bit more readable.
> </bikeshed>

OK fair enough. I've made this change for the next rev.

> 
> 	M.
>