[PATCH] arm64/kvm: correct the error report in kvm_handle_guest_abort

Marc Zyngier maz at kernel.org
Tue Jan 26 04:18:00 EST 2021


On 2021-01-26 08:10, Jianyong Wu wrote:
> Hi Marc,
> 
>>   		/*
>> -		 * Check for a cache maintenance operation. Since we
>> -		 * ended-up here, we know it is outside of any memory
>> -		 * slot. But we can't find out if that is for a device,
>> -		 * or if the guest is just being stupid. The only thing
>> -		 * we know for sure is that this range cannot be cached.
>> +		 * Check for a cache maintenance operation. Two cases:
>>   		 *
>> -		 * So let's assume that the guest is just being
>> -		 * cautious, and skip the instruction.
>> +		 * - It is outside of any memory slot. But we can't
>> +		 *   find out if that is for a device, or if the guest
>> +		 *   is just being stupid. The only thing we know for
>> +		 *   sure is that this range cannot be cached.  So
>> +		 *   let's assume that the guest is just being
>> +		 *   cautious, and skip the instruction.
>> +		 *
>> +		 * - Otherwise, clean/invalidate the whole memslot. We
>> +		 *   should special-case DC IVAC and inject a
>> +		 *   permission fault, but we can't really identify it
>> +		 *   in this context.
>>   		 */
>> -		if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu))
>> {
>> +		if (kvm_vcpu_dabt_is_cm(vcpu)) {
>> +			if (!kvm_is_error_hva(hva)) {
>> +				spin_lock(&vcpu->kvm->mmu_lock);
>> +				stage2_flush_memslot(vcpu->kvm,
>> memslot);
> 
> Maybe we should not flush the whole memslot here as every "dc ivac"
> only work on a range of memory with cache line size. So what about
> using:
> stage2_apply_range_resched(vcpu->kvm, fault_ipa, fault_ipa +
> cache_line_size(), kvm_pgtable_stage2_flush) instead. It will a bit
> faster than flush the whole memslot.
> Also I test your idea of "unmap after flush" using:
> 	stage2_apply_range_resched(vcpu->kvm, fault_ipa, fault_ipa +
> cache_line_size(), kvm_pgtable_stage2_flush);
> 	stage2_apply_range(vcpu->kvm, fault_ipa, fault_ipa +
> cache_line_size(), kvm_pgtable_stage2_unmap, true);
> then I do "dc ivac" on the rom of 128M twice and got the double time
> of around 11s. it means that there is no optimization when do the
> second "dc ivac".
> I'm not sure if there is something wrong in my test.
> 
> So what about just using " stage2_apply_range_resched(vcpu->kvm,
> fault_ipa, fault_ipa + cache_line_size(), kvm_pgtable_stage2_flush);"
> instead of
> " stage2_flush_memslot(vcpu->kvm, memslot);" and let the guest bears
> the disadvantage of low performance.

No, both solutions are wrong. I had to write my own test case because
the idea of hacking some unknown guest wasn't very appealing.

At the end of the day, what we need to implement is as follow:

- if a CMO hits normal memory, it's all already handled
- if a CMO hits non-memory, we skip it, as we do today
- if a CMO hits R/O memory, that's where things become fun:
   - if it is a DC IVAC, the architecture says this should result
     in a permission fault
   - if it is a DC CIVAC, it works as expected

So we need to distinguish between IVAC and CIVAC. One way to do it
is to treat CMOs generating a translation fault as a *read*, even
when they are on a RO memslot.

If they come back with a permission fault:
- inside a RW memslot: no problem, treat it as a write
- inside a RO memslot: only IVAC will fault here, inject an abort
   in the guest

This translates into the following patch, which does the trick
for me.

         M.

 From 8f2a919d6f13d36445974794c76821fbb6b40f88 Mon Sep 17 00:00:00 2001
 From: Marc Zyngier <maz at kernel.org>
Date: Sat, 16 Jan 2021 10:53:21 +0000
Subject: [PATCH] CMO on RO memslot

Signed-off-by: Marc Zyngier <maz at kernel.org>
---
  arch/arm64/kvm/mmu.c | 51 +++++++++++++++++++++++++++++++++-----------
  1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 7d2257cc5438..3c176b5b0a28 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -760,7 +760,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
  	struct kvm_pgtable *pgt;

  	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
-	write_fault = kvm_is_write_fault(vcpu);
+	/*
+	 * Treat translation faults on CMOs as read faults. Should
+	 * this further generate a permission fault, it will be caught
+	 * in kvm_handle_guest_abort(), with prejudice...
+	 */
+	if (fault_status == FSC_FAULT && kvm_vcpu_dabt_is_cm(vcpu))
+		write_fault = false;
+	else
+		write_fault = kvm_is_write_fault(vcpu);
  	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
  	VM_BUG_ON(write_fault && exec_fault);

@@ -1013,19 +1021,37 @@ int kvm_handle_guest_abort(struct kvm_vcpu 
*vcpu)
  		}

  		/*
-		 * Check for a cache maintenance operation. Since we
-		 * ended-up here, we know it is outside of any memory
-		 * slot. But we can't find out if that is for a device,
-		 * or if the guest is just being stupid. The only thing
-		 * we know for sure is that this range cannot be cached.
+		 * Check for a cache maintenance operation. Two cases:
+		 *
+		 * - It is outside of any memory slot. But we can't find out
+		 *   if that is for a device, or if the guest is just being
+		 *   stupid. The only thing we know for sure is that this
+		 *   range cannot be cached.  So let's assume that the guest
+		 *   is just being cautious, and skip the instruction.
+		 *
+		 * - Otherwise, check whether this is a permission fault.
+		 *   If so, that's a DC IVAC on a R/O memslot, which is a
+		 *   pretty bad idea, and we tell the guest so.
  		 *
-		 * So let's assume that the guest is just being
-		 * cautious, and skip the instruction.
+		 * - If this wasn't a permission fault, pass it along for
+                 *   further handling (including faulting the page in 
if it
+                 *   was a translation fault).
  		 */
-		if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) {
-			kvm_incr_pc(vcpu);
-			ret = 1;
-			goto out_unlock;
+		if (kvm_vcpu_dabt_is_cm(vcpu)) {
+			if (kvm_is_error_hva(hva)) {
+				kvm_incr_pc(vcpu);
+				ret = 1;
+				goto out_unlock;
+			}
+
+			if (fault_status == FSC_PERM) {
+				/* DC IVAC on a R/O memslot */
+				kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
+				ret = 1;
+				goto out_unlock;
+			}
+
+			goto handle_access;
  		}

  		/*
@@ -1039,6 +1065,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
  		goto out_unlock;
  	}

+handle_access:
  	/* Userspace should not be able to register out-of-bounds IPAs */
  	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));

-- 
2.29.2


-- 
Jazz is not dead. It just smells funny...



More information about the linux-arm-kernel mailing list