[PATCH] KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW

Punit Agrawal punit.agrawal at arm.com
Tue Sep 4 04:07:37 PDT 2018


Christoffer Dall <christoffer.dall at arm.com> writes:

> On Mon, Sep 03, 2018 at 06:29:30PM +0100, Punit Agrawal wrote:
>> Christoffer Dall <christoffer.dall at arm.com> writes:
>> 
>> > [Adding Andrea and Steve in CC]
>> >
>> > On Thu, Aug 23, 2018 at 04:33:42PM +0100, Marc Zyngier wrote:
>> >> When triggering a CoW, we unmap the RO page via an MMU notifier
>> >> (invalidate_range_start), and then populate the new PTE using another
>> >> one (change_pte). In the meantime, we'll have copied the old page
>> >> into the new one.
>> >> 
>> >> The problem is that the data for the new page is sitting in the
>> >> cache, and should the guest have an uncached mapping to that page
>> >> (or its MMU off), following accesses will bypass the cache.
>> >> 
>> >> In a way, this is similar to what happens on a translation fault:
>> >> We need to clean the page to the PoC before mapping it. So let's just
>> >> do that.
>> >> 
>> >> This fixes a KVM unit test regression observed on a HiSilicon platform,
>> >> and subsequently reproduced on Seattle.
>> >> 
>> >> Fixes: a9c0e12ebee5 ("KVM: arm/arm64: Only clean the dcache on translation fault")
>> >> Reported-by: Mike Galbraith <efault at gmx.de>
>> >> Signed-off-by: Marc Zyngier <marc.zyngier at arm.com>
>> >> ---
>> >>  virt/kvm/arm/mmu.c | 9 ++++++++-
>> >>  1 file changed, 8 insertions(+), 1 deletion(-)
>> >> 
>> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> >> index 1d90d79706bd..287c8e274655 100644
>> >> --- a/virt/kvm/arm/mmu.c
>> >> +++ b/virt/kvm/arm/mmu.c
>> >> @@ -1811,13 +1811,20 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data
>> >>  void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>> >>  {
>> >>  	unsigned long end = hva + PAGE_SIZE;
>> >> +	kvm_pfn_t pfn = pte_pfn(pte);
>> >>  	pte_t stage2_pte;
>> >>  
>> >>  	if (!kvm->arch.pgd)
>> >>  		return;
>> >>  
>> >>  	trace_kvm_set_spte_hva(hva);
>> >> -	stage2_pte = pfn_pte(pte_pfn(pte), PAGE_S2);
>> >> +
>> >> +	/*
>> >> +	 * We've moved a page around, probably through CoW, so let's treat
>> >> +	 * just like a translation fault and clean the cache to the PoC.
>> >> +	 */
>> >> +	clean_dcache_guest_page(pfn, PAGE_SIZE);
>> >> +	stage2_pte = pfn_pte(pfn, PAGE_S2);
>> >>  	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
>> >>  }
>> >
>> > How does this work for pmd mappings?
>> 
>> kvm_set_spte_hva() isn't called for PMD mappings. But...
>> 
>> >
>> > Are we guaranteed that a pmd mapping (hugetlbfs or THP) is split before
>> > a CoW happens?
>> >
>> > Steve tells me that we share THP mappings on fork and that we back THPs
>> > by a zero page, so CoW with THP should be possible.
>> >
>> 
>> ...the problem seems to affect handling write permission faults for CoW
>> or zero pages.
>> 
>> The memory gets unmapped at stage 2 due to the invalidate notifier (in
>> hugetlb_cow() for hugetlbfs and do_huge_pmd_wp_page() for THP) 
>
> So just to make sure I get this right.  For a pte CoW, Linux calls the
> set_spte function to simply change the pte mapping, without doing any
> unmapping at stage 2, 

No.

I hadn't checked into the PTE CoW for zero pages when replying but was
relying on Marc's commit log. I've outlined the flow below.

> but with pmd, we unmap and wait to take another fault as an
> alternative method?

Having looked at handling of CoW handling for the different page sizes,
here's my understanding of the steps for CoW faults - note the slight
variance when dealing with PTE entries.

* Guest takes a stage 2 permission fault (user_mem_abort())

* The host mapping is updated to point to another page (either zeroed or
  contents copied). This happens via the get_user_pages_unlocked()
  invoked via gfn_to_pfn_prot().

* For all page sizes, mmu_invalidate_range_start() notifiers are called
  which will unmap the memory at stage 2.

* For PTE (wp_page_copy), set_pte_at_notify() is called which eventually
  calls kvm_set_pte_hva() modified in $SUBJECT.

  For hugepages (hugetlb_cow) and annonymous THP (do_huge_pmd_wp_page)
  there are no notifying versions of page table entry updaters so stage
  2 entries remain unmapped.

* mmu_notifier_invalidate_range_end() is called. This updates
  mmu_notifier_seq which will abort the stage 2 fault handling in
  user_mem_abort().

* As stage 2 was left unmapped for hugepages (no change_pte notifier),
  there'll be a subsequent translation fault where the mapped page will
  be cleaned/invalidated.

> Why the two different approaches depending on the mapping size?

I'm not sure why hugepage change notifier doesn't exist.

>
>> while the
>> cache maintenance for the newly allocated page will be skipped due to
>> the !FSC_PERM. 
>
> If the above holds, then the subsequent fault would be similar to any
> other fault which should work via the normal mechanism (dcache clean +
> XN on fault, icache invalidation on permission fault).
>> 
>> Hmm... smells like there might be a problem here. I'll try and put
>> together a fix.
>> 
>
> It's not clear to me if we have a problem, or just different ways of
> handling the PMD vs. PTE CoW case?

With the above context, I don't think there is a problem for hugepage
handling. The missing dcache maintenance addressed by $SUBJECT should be
sufficient for PTEs.

Thanks,
Punit



More information about the linux-arm-kernel mailing list