[PATCH v3 26/36] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte

Will Deacon will at kernel.org
Mon Mar 23 07:58:56 PDT 2026


On Fri, Mar 20, 2026 at 04:35:44PM +0000, Marc Zyngier wrote:
> On Thu, 05 Mar 2026 14:43:39 +0000,
> Will Deacon <will at kernel.org> wrote:
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index 4ff31947579b..7f705f662c40 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -890,6 +890,49 @@ static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep,
> >  	return 0;
> >  }
> >  
> > +int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu)
> > +{
> > +	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
> > +	kvm_pte_t pte;
> > +	s8 level;
> > +	u64 ipa;
> > +	int ret;
> > +
> > +	switch (kvm_vcpu_trap_get_class(&hyp_vcpu->vcpu)) {
> > +	case ESR_ELx_EC_DABT_LOW:
> > +	case ESR_ELx_EC_IABT_LOW:
> > +		if (kvm_vcpu_trap_is_translation_fault(&hyp_vcpu->vcpu))
> > +			break;
> > +		fallthrough;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	/*
> > +	 * The host has the faulting IPA when it calls us from the guest
> > +	 * fault handler but we retrieve it ourselves from the FAR so as
> > +	 * to avoid exposing an "oracle" that could reveal data access
> > +	 * patterns of the guest after initial donation of its pages.
> > +	 */
> > +	ipa = kvm_vcpu_get_fault_ipa(&hyp_vcpu->vcpu);
> > +	ipa |= kvm_vcpu_get_hfar(&hyp_vcpu->vcpu) & GENMASK(11, 0);
> 
> nit: we now have FAR_TO_FIPA_OFFSET() for this.

Neat, I'll use that. Thanks.

> > diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> > index 32294bd21dde..da0a45dab203 100644
> > --- a/arch/arm64/kvm/pkvm.c
> > +++ b/arch/arm64/kvm/pkvm.c
> > @@ -417,10 +417,13 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> >  			return -EINVAL;
> >  
> >  		/*
> > -		 * We raced with another vCPU.
> > +		 * We either raced with another vCPU or the guest PTE
> > +		 * has been poisoned by an erroneous host access.
> >  		 */
> > -		if (mapping)
> > -			return -EAGAIN;
> > +		if (mapping) {
> > +			ret = kvm_call_hyp_nvhe(__pkvm_vcpu_in_poison_fault);
> > +			return ret ? -EFAULT : -EAGAIN;
> > +		}
> 
> I guess this considers that racing against another vcpu is an unlikely
> situation, because calling back into EL2 and walking the PTs isn't
> exactly cheap.

Yeah, I wanted to avoid walking the stage-2 page-table at EL2 on every
fault, so it ends up being deferred to here in the case that we find an
existing mapping for the faulting IPA.

> I wonder if there is a mechanism we could use to directly return this
> information to the host at the point of the guest fault. The only
> things I can figure out would require the PTE to be valid (access or
> permission faults, for example), and that'd break the "full PTE
> dedicated to annotations"...

Oh, I see what you mean... using the fault type as a proxy feels like it
probably won't scale so well if we ever want to use those faults for
anything else.

If we want to optimise the common case, perhaps I could set a flag in
the host kvm structure (from EL2) when the page is poisoned in
__pkvm_host_force_reclaim_page_guest() and then check that here? In that
case, only VMs that have had a page forcefully-reclaimed will issue the
hypercall. There's a race, but I think it's ok because we'll get -EAGAIN
and pick up the flag the next time around.

WDYT? It might be premature optimisation, but it also feels do-able?

Will



More information about the linux-arm-kernel mailing list