[PATCH v2] KVM: arm64: Inject exception on out-of-IPA-range translation fault

Alexandru Elisei alexandru.elisei at arm.com
Fri Apr 29 03:44:27 PDT 2022


Hi,

On Thu, Apr 28, 2022 at 06:55:56PM +0100, Marc Zyngier wrote:
> On Thu, 28 Apr 2022 17:07:21 +0100,
> Alexandru Elisei <alexandru.elisei at arm.com> wrote:
> > 
> > Hi,
> > 
> > On Thu, Apr 28, 2022 at 04:22:58PM +0100, Marc Zyngier wrote:
> > > On Thu, 28 Apr 2022 09:46:21 +0100,
> > > Alexandru Elisei <alexandru.elisei at arm.com> wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > On Wed, Apr 27, 2022 at 11:04:34PM +0100, Marc Zyngier wrote:
> > > > > When taking a translation fault for an IPA that is outside of
> > > > > the range defined by the hypervisor (between the HW PARange and
> > > > > the IPA range), we stupidly treat it as an IO and forward the access
> > > > > to userspace. Of course, userspace can't do much with it, and things
> > > > > end badly.
> > > > > 
> > > > > Arguably, the guest is braindead, but we should at least catch the
> > > > > case and inject an exception.
> > > > > 
> > > > > Check the faulting IPA against:
> > > > > - the sanitised PARange: inject an address size fault
> > > > > - the IPA size: inject an abort
> > > > > 
> > > > > Reported-by: Christoffer Dall <christoffer.dall at arm.com>
> > > > > Signed-off-by: Marc Zyngier <maz at kernel.org>
> > > > > ---
> > > > >  arch/arm64/include/asm/kvm_emulate.h |  1 +
> > > > >  arch/arm64/kvm/inject_fault.c        | 28 ++++++++++++++++++++++++++++
> > > > >  arch/arm64/kvm/mmu.c                 | 19 +++++++++++++++++++
> > > > >  3 files changed, 48 insertions(+)
> > > > > 
> > > > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > > > > index 7496deab025a..f71358271b71 100644
> > > > > --- a/arch/arm64/include/asm/kvm_emulate.h
> > > > > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > > > > @@ -40,6 +40,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu);
> > > > >  void kvm_inject_vabt(struct kvm_vcpu *vcpu);
> > > > >  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
> > > > >  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
> > > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu);
> > > > >  
> > > > >  void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
> > > > >  
> > > > > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> > > > > index b47df73e98d7..ba20405d2dc2 100644
> > > > > --- a/arch/arm64/kvm/inject_fault.c
> > > > > +++ b/arch/arm64/kvm/inject_fault.c
> > > > > @@ -145,6 +145,34 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
> > > > >  		inject_abt64(vcpu, true, addr);
> > > > >  }
> > > > >  
> > > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > +	unsigned long addr, esr;
> > > > > +
> > > > > +	addr  = kvm_vcpu_get_fault_ipa(vcpu);
> > > > > +	addr |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
> > > > > +
> > > > > +	if (kvm_vcpu_trap_is_iabt(vcpu))
> > > > > +		kvm_inject_pabt(vcpu, addr);
> > > > > +	else
> > > > > +		kvm_inject_dabt(vcpu, addr);
> > > > > +
> > > > > +	/*
> > > > > +	 * If AArch64 or LPAE, set FSC to 0 to indicate an Address
> > > > > +	 * Size Fault at level 0, as if exceeding PARange.
> > > > > +	 *
> > > > > +	 * Non-LPAE guests will only get the external abort, as there
> > > > > +	 * is no way to to describe the ASF.
> > > > > +	 */
> > > > > +	if (vcpu_el1_is_32bit(vcpu) &&
> > > > > +	    !(vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE))
> > > > > +		return;
> > > > > +
> > > > > +	esr = vcpu_read_sys_reg(vcpu, ESR_EL1);
> > > > > +	esr &= ~GENMASK_ULL(5, 0);
> > > > > +	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> > > > > +}
> > > > > +
> > > > >  /**
> > > > >   * kvm_inject_undefined - inject an undefined instruction into the guest
> > > > >   * @vcpu: The vCPU in which to inject the exception
> > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > > index 53ae2c0640bc..5400fc020164 100644
> > > > > --- a/arch/arm64/kvm/mmu.c
> > > > > +++ b/arch/arm64/kvm/mmu.c
> > > > > @@ -1337,6 +1337,25 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > > > >  	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> > > > >  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> > > > >  
> > > > > +	if (fault_status == FSC_FAULT) {
> > > > > +		/* Beyond sanitised PARange (which is the IPA limit) */
> > > > > +		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
> > > > > +			kvm_inject_size_fault(vcpu);
> > > > > +			return 1;
> > > > > +		}
> > > > > +
> > > > > +		/* Falls between the IPA range and the PARange? */
> > > > > +		if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) {
> > > > > +			fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
> > > > > +
> > > > > +			if (is_iabt)
> > > > > +				kvm_inject_pabt(vcpu, fault_ipa);
> > > > > +			else
> > > > > +				kvm_inject_dabt(vcpu, fault_ipa);
> > > > > +			return 1;
> > > > > +		}
> > > > 
> > > > Doesn't KVM treat faults outside a valid memslot (aka guest RAM) as MMIO
> > > > aborts? From the guest's point of view, the IPA is valid because it's
> > > > inside the HW PARange, so it's not entirely impossible that the VMM put a
> > > > device there.
> > > 
> > > Sure. But the generated IPA is outside of the range the VMM has asked
> > > to handle. The IPA space describes the whole of the guest address
> > > space, and there shouldn't be anything outside of it.
> > > 
> > > We actually state in the documentation that the IPA Size limit *is*
> > > the physical address size for the VM. If the VMM places something
> > > outside if the IPA space and still expect something to be reported to
> > > it, we have a problem (in some cases, we may want to actually put page
> > > tables in place even for MMIO that traps to userspace -- see my
> > > earlier work on MMIO guard).
> > 
> > If you mean this bit:
> > 
> > On arm64, the physical address size for a VM (IPA Size limit) is limited
> > to 40bits by default. The limit can be configured if the host supports the
> > extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
> > KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
> > identifier, where IPA_Bits is the maximum width of any physical
> > address used by the VM.
> >
> > And then below there is this paragraph:
> > 
> > Please note that configuring the IPA size does not affect the capability
> > exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects
> > **size of the address translated by the stage2 level (guest physical to
> > host physical address translations)**.
> 
> I don't see that as a contradiction. It just says that we don't
> repaint PARange.
> 
> > 
> > Emphasis added by me.
> > 
> > It looks to me like the two paragraph state different things, first says
> > the IPA size caps "the physical address size for a VM", the second that it
> > caps the RAM size - "size of the address translated by the stage 2 level.
> 
> That's not the way I understand it. It just gives a textbook
> definition of the IPA space. And to be clear, this is just an
> implementation detail. We should be able to populate all full IPA
> space with faulting entries and keep the behaviour the same.
> 
> > I have no problem with either, but it looks confusing.
> > 
> > So if a VMM that wants to put devices above RAM it must make sure that the
> > IPA size is extended to match, did I get that right?
> 
> Yes. Anything that is reacheable by a memory transaction has to fit in
> the IPA space.
> 
> > I'm also a bit confused about the rationale. Why is the PARange exposed to
> > the guest in effect the wrong value, because the true PARange is defined by
> > IPA size?
> 
> PARange and IPA range don't have the same granularity. You can't
> express a PARange of 37 bits, for example, while it is perfectly
> possible for the IPA range. And they do cover two different concepts:
> the IPA space means nothing to the guest.

I see, thank you for the detailed explanation!

Thanks,
Alex



More information about the linux-arm-kernel mailing list