[PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table

Will Deacon will at kernel.org
Thu Sep 3 12:15:40 EDT 2020


Hi Gavin,

On Thu, Sep 03, 2020 at 01:30:32PM +0100, Will Deacon wrote:
> On Thu, Sep 03, 2020 at 09:18:27PM +1000, Gavin Shan wrote:
> > On 8/25/20 7:39 PM, Will Deacon wrote:
> > > +static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> > > +				      kvm_pte_t *ptep,
> > > +				      struct stage2_map_data *data)
> > > +{
> > > +	int ret = 0;
> > > +
> > > +	if (!data->anchor)
> > > +		return 0;
> > > +
> > > +	free_page((unsigned long)kvm_pte_follow(*ptep));
> > > +	put_page(virt_to_page(ptep));
> > > +
> > > +	if (data->anchor == ptep) {
> > > +		data->anchor = NULL;
> > > +		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > 
> > As discussed in another thread, *ptep has been invalidated in stage2_map_walk_table_pre().
> > It means *ptep has value of zero. The following call to free_page() is going to release
> > the page frame corresponding to physical address 0x0. It's not correct. We might cache
> > the original value of this page table entry so that it can be used here.
> 
> Ah, yes, I see what you mean. But it's odd that I haven't run into this
> myself, so let me try to reproduce the issue first. Another solution is
> to invalidate the table entry only by clearing the valid bit of the pte,
> rather than zapping the entire thing to 0, which can be done later when we
> clear the anchor.

Ok! There are a couple of issues here:

  1. As you point out, the kvm_pte_follow() above ends up chasing a zeroed
     pte.

  2. The reason I'm not seeing this in testing is because the dirty logging
     code isn't hitting the table -> block case as it should. This is
     because I'm not handling permission faults properly when a write
     hits a read-only block entry. In this case, we need to collapse the
     entry if logging is active.

Diff below seems to clear all of this up. I'll fold it in for v4.

Thanks for reporting the problem and helping to debug it.

Will

--->8

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index dc76fdf31be3..9328830e9464 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -155,8 +155,8 @@ static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte)
 
 static void kvm_set_invalid_pte(kvm_pte_t *ptep)
 {
-       kvm_pte_t pte = 0;
-       WRITE_ONCE(*ptep, pte);
+       kvm_pte_t pte = *ptep;
+       WRITE_ONCE(*ptep, pte & ~KVM_PTE_VALID);
 }
 
 static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f28e03dcb897..10b73da6abb2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -737,11 +737,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
        bool exec_fault;
        bool device = false;
        unsigned long mmu_seq;
-       gfn_t gfn = fault_ipa >> PAGE_SHIFT;
        struct kvm *kvm = vcpu->kvm;
        struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
        struct vm_area_struct *vma;
        short vma_shift;
+       gfn_t gfn;
        kvm_pfn_t pfn;
        bool logging_active = memslot_is_logging(memslot);
        unsigned long vma_pagesize;
@@ -780,10 +780,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
        }
 
        if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
-               gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
+               fault_ipa &= huge_page_mask(hstate_vma(vma));
+
+       gfn = fault_ipa >> PAGE_SHIFT;
        mmap_read_unlock(current->mm);
 
-       if (fault_status != FSC_PERM) {
+       /*
+        * Permission faults just need to update the existing leaf entry,
+        * and so normally don't require allocations from the memcache. The
+        * only exception to this is when dirty logging is enabled at runtime
+        * and a write fault needs to collapse a block entry into a table.
+        */
+       if (fault_status != FSC_PERM || (logging_active && write_fault)) {
                ret = kvm_mmu_topup_memory_cache(memcache,
                                                 kvm_mmu_cache_min_pages(kvm));
                if (ret)
@@ -854,7 +862,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
        else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
                prot |= KVM_PGTABLE_PROT_X;
 
-       if (fault_status == FSC_PERM) {
+       if (fault_status == FSC_PERM && !(logging_active && writable)) {
                ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
        } else {
                ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,




More information about the linux-arm-kernel mailing list