[PATCH v4 02/28] KVM: arm64: Donate MMIO to the hypervisor

Mon Sep 29 03:57:36 PDT 2025

On Fri, Sep 26, 2025 at 03:33:06PM +0100, Will Deacon wrote:
> On Tue, Sep 16, 2025 at 01:27:39PM +0000, Mostafa Saleh wrote:
> > On Tue, Sep 09, 2025 at 03:12:45PM +0100, Will Deacon wrote:
> > > On Tue, Aug 19, 2025 at 09:51:30PM +0000, Mostafa Saleh wrote:
> > > > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > > index 861e448183fd..c9a15ef6b18d 100644
> > > > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > > @@ -799,6 +799,70 @@ int ___pkvm_host_donate_hyp(u64 pfn, u64 nr_pages, enum kvm_pgtable_prot prot)
> > > >  	return ret;
> > > >  }
> > > >  
> > > > +int __pkvm_host_donate_hyp_mmio(u64 pfn)
> > > > +{
> > > > +	u64 phys = hyp_pfn_to_phys(pfn);
> > > > +	void *virt = __hyp_va(phys);
> > > > +	int ret;
> > > > +	kvm_pte_t pte;
> > > > +
> > > > +	host_lock_component();
> > > > +	hyp_lock_component();
> > > > +
> > > > +	ret = kvm_pgtable_get_leaf(&host_mmu.pgt, phys, &pte, NULL);
> > > > +	if (ret)
> > > > +		goto unlock;
> > > > +
> > > > +	if (pte && !kvm_pte_valid(pte)) {
> > > > +		ret = -EPERM;
> > > > +		goto unlock;
> > > > +	}
> > > 
> > > Shouldn't we first check that the pfn is indeed MMIO? Otherwise, testing
> > > the pte for the ownership information isn't right.
> > 
> > I will add it, although the input should be trusted as it comes from the
> > hypervisor SMMUv3 driver.
> 
> (more on this below)
> 
> > > > +int __pkvm_hyp_donate_host_mmio(u64 pfn)
> > > > +{
> > > > +	u64 phys = hyp_pfn_to_phys(pfn);
> > > > +	u64 virt = (u64)__hyp_va(phys);
> > > > +	size_t size = PAGE_SIZE;
> > > > +
> > > > +	host_lock_component();
> > > > +	hyp_lock_component();
> > > 
> > > Shouldn't we check that:
> > > 
> > >   1. pfn is mmio
> > >   2. pfn is owned by hyp
> > >   3. The host doesn't have something mapped at pfn already
> > > 
> > > ?
> > > 
> > 
> > I thought about this initially, but as
> > - This code is only called from the hypervisor with trusted
> >   inputs (only at boot)
> > - Only called on error path
> > 
> > So WARN_ON in case of failure to unmap MMIO pages seemed is good enough,
> > to avoid extra code.
> > 
> > But I can add the checks if you think they are necessary, we will need
> > to add new helpers for MMIO state though.
> 
> I'd personally prefer to put the checks here so that callers don't have
> to worry (or forget!) about them. That also means that the donation
> function can be readily reused in the same way as the existing functions
> which operate on memory pages.
> 
> How much work is it to add the MMIO helpers?

It's not much work I guess, I was just worried about adding new helpers
just to use in a rare error path.
I will add them for v5.

Thanks,
Mostafa

> 
> > > > +	WARN_ON(kvm_pgtable_hyp_unmap(&pkvm_pgtable, virt, size) != size);
> > > > +	WARN_ON(host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt, phys,
> > > > +				PAGE_SIZE, &host_s2_pool, PKVM_ID_HOST));
> > > > +	hyp_unlock_component();
> > > > +	host_unlock_component();
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
> > > >  {
> > > >  	return ___pkvm_host_donate_hyp(pfn, nr_pages, PAGE_HYP);
> > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > > index c351b4abd5db..ba06b0c21d5a 100644
> > > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > > @@ -1095,13 +1095,8 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> > > >  	kvm_pte_t *childp = NULL;
> > > >  	bool need_flush = false;
> > > >  
> > > > -	if (!kvm_pte_valid(ctx->old)) {
> > > > -		if (stage2_pte_is_counted(ctx->old)) {
> > > > -			kvm_clear_pte(ctx->ptep);
> > > > -			mm_ops->put_page(ctx->ptep);
> > > > -		}
> > > > -		return 0;
> > > > -	}
> > > > +	if (!kvm_pte_valid(ctx->old))
> > > > +		return stage2_pte_is_counted(ctx->old) ? -EPERM : 0;
> > > 
> > > Can this code be reached for the guest? For example, if
> > > pkvm_pgtable_stage2_destroy() runs into an MMIO-guarded pte on teardown?
> > 
> > AFAICT, VMs page table is destroyed from reclaim_pgtable_pages() =>
> > kvm_pgtable_stage2_destroy() => kvm_pgtable_stage2_destroy_range() ... =>
> > stage2_free_walker()
> > 
> > Which doesn't interact with “stage2_unmap_walker”, so that should be
> > fine.
> 
> Fair enough. I feel like this might bite us later on but, with what you
> have, we'll see the -EPERM and then we can figure out what to do then.
> 
> Will