[PATCH 22/30] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte
Will Deacon
will at kernel.org
Fri Jan 9 06:57:10 PST 2026
On Tue, Jan 06, 2026 at 03:54:06PM +0000, Quentin Perret wrote:
> On Monday 05 Jan 2026 at 15:49:30 (+0000), Will Deacon wrote:
> > +int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu)
> > +{
> > + struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
> > + kvm_pte_t pte;
> > + s8 level;
> > + u64 ipa;
> > + int ret;
> > +
> > + switch (kvm_vcpu_trap_get_class(&hyp_vcpu->vcpu)) {
> > + case ESR_ELx_EC_DABT_LOW:
> > + case ESR_ELx_EC_IABT_LOW:
> > + if (kvm_vcpu_trap_is_translation_fault(&hyp_vcpu->vcpu))
> > + break;
> > + fallthrough;
> > + default:
> > + return -EINVAL;
> > + }
> > +
> > + ipa = kvm_vcpu_get_fault_ipa(&hyp_vcpu->vcpu);
> > + ipa |= kvm_vcpu_get_hfar(&hyp_vcpu->vcpu) & GENMASK(11, 0);
>
> Why is all the above needed? Could we simplify by having the host pass
> the IPA to the hcall?
I was just a little nervous about exposing an oracle here if we take the
gfn as an argument as it would provide the host with a pretty easy
mechanism to monitor the page access pattern of a guest after the initial
donation had occurred.
> > + guest_lock_component(vm);
> > + ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
> > + if (ret)
> > + goto unlock;
> > +
> > + if (level != KVM_PGTABLE_LAST_LEVEL) {
> > + ret = -EINVAL;
> > + goto unlock;
> > + }
> > +
> > + ret = guest_pte_is_poisoned(pte);
> > +unlock:
> > + guest_unlock_component(vm);
> > + return ret;
> > +}
> > +
> > int __pkvm_host_share_hyp(u64 pfn)
> > {
> > u64 phys = hyp_pfn_to_phys(pfn);
> > diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> > index d1926cb08c76..14865907610c 100644
> > --- a/arch/arm64/kvm/pkvm.c
> > +++ b/arch/arm64/kvm/pkvm.c
> > @@ -417,10 +417,13 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > return -EINVAL;
> >
> > /*
> > - * We raced with another vCPU.
> > + * We either raced with another vCPU or the guest PTE
> > + * has been poisoned by an erroneous host access.
> > */
> > - if (mapping)
> > - return -EAGAIN;
> > + if (mapping) {
> > + ret = kvm_call_hyp_nvhe(__pkvm_vcpu_in_poison_fault);
>
> It's not too bad, but it's a shame we now issue that every time we have
> such a race (which is frequent-ish). Could we perhaps only issue it if
> at least one page has been forcefully reclaimed since boot?
On the plus side, it avoids an unconditional walk from the fault path
at EL2 (which is what we have in Android!).
It's a bit fiddly to implement your idea in the host, since the forceful
reclaim happens in a really terrible context but I could track it at EL2
and make __pkvm_vcpu_in_poison_fault() return early instead? It's also
worth bearing in mind that we've already serialised the concurrent fault
and done a GUP by this point, so performance is somewhat of a lost
cause...
WDYT?
Will
More information about the linux-arm-kernel
mailing list