[RFC PATCH 4/4] KVM: arm64: Distinguish cases of memcache allocations completely

Thu Mar 25 17:26:25 GMT 2021

Hi Yanan,

On 2/8/21 11:22 AM, Yanan Wang wrote:
> With a guest translation fault, the memcache pages are not needed if KVM
> is only about to install a new leaf entry into the existing page table.
> And with a guest permission fault, the memcache pages are also not needed
> for a write_fault in dirty-logging time if KVM is only about to update
> the existing leaf entry instead of collapsing a block entry into a table.
>
> By comparing fault_granule and vma_pagesize, cases that require allocations
> from memcache and cases that don't can be distinguished completely.
>
> Signed-off-by: Yanan Wang <wangyanan55 at huawei.com>
> ---
>  arch/arm64/kvm/mmu.c | 25 ++++++++++++-------------
>  1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d151927a7d62..550498a9104e 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -815,19 +815,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	gfn = fault_ipa >> PAGE_SHIFT;
>  	mmap_read_unlock(current->mm);
>  
> -	/*
> -	 * Permission faults just need to update the existing leaf entry,
> -	 * and so normally don't require allocations from the memcache. The
> -	 * only exception to this is when dirty logging is enabled at runtime
> -	 * and a write fault needs to collapse a block entry into a table.
> -	 */
> -	if (fault_status != FSC_PERM || (logging_active && write_fault)) {
> -		ret = kvm_mmu_topup_memory_cache(memcache,
> -						 kvm_mmu_cache_min_pages(kvm));
> -		if (ret)
> -			return ret;
> -	}
> -
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	/*
>  	 * Ensure the read of mmu_notifier_seq happens before we call
> @@ -887,6 +874,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
>  		prot |= KVM_PGTABLE_PROT_X;
>  
> +	/*
> +	 * Allocations from the memcache are required only when granule of the
> +	 * lookup level where the guest fault happened exceeds vma_pagesize,
> +	 * which means new page tables will be created in the fault handlers.
> +	 */
> +	if (fault_granule > vma_pagesize) {
> +		ret = kvm_mmu_topup_memory_cache(memcache,
> +						 kvm_mmu_cache_min_pages(kvm));
> +		if (ret)
> +			return ret;
> +	}

I distinguish three situations:

1. fault_granule == vma_pagesize. If the stage 2 fault occurs at the leaf level,
then it means that all the tables that the translation table walker traversed
until the leaf are valid. No need to allocate a new page, as stage 2 will only
change the leaf to point to a valid PA.

2. fault_granule > vma_pagesize. This means that there's a table missing at some
point in the table walk, so we're going to need to allocate at least one table to
hold the leaf entry. We need to topup the memory cache.

3. fault_granule < vma_pagesize. From our discussion in patch #3, this can happen
only if the userspace translation tables use a block mapping, dirty page logging
is enabled, the fault_ipa is mapped as a last level entry, dirty page logging gets
disabled and then we get a fault. In this case, the PTE table will be coalesced
into a PMD block mapping, and the PMD table entry that pointed to the PTE table
will be changed to a block mapping. No table will be allocated.

Looks to me like this patch is valid, but getting it wrong can break a VM and I
would feel a lot more comfortable if someone who is more familiar with the code
would have a look.

Thanks,

Alex

> +
>  	/*
>  	 * Under the premise of getting a FSC_PERM fault, we just need to relax
>  	 * permissions only if vma_pagesize equals fault_granule. Otherwise,