[PATCH v4 02/28] KVM: arm64: Donate MMIO to the hypervisor
Will Deacon
will at kernel.org
Tue Sep 9 07:12:45 PDT 2025
On Tue, Aug 19, 2025 at 09:51:30PM +0000, Mostafa Saleh wrote:
> Add a function to donate MMIO to the hypervisor so IOMMU hypervisor
> drivers can use that to protect the MMIO of IOMMU.
> The initial attempt to implement this was to have a new flag to
> "___pkvm_host_donate_hyp" to accept MMIO. However that had many problems,
> it was quite intrusive for host/hyp to check/set page state to make it
> aware of MMIO and to encode the state in the page table in that case.
> Which is called in paths that can be sensitive to performance (FFA, VMs..)
>
> As donating MMIO is very rare, and we don’t need to encode the full state,
> it’s reasonable to have a separate function to do this.
> It will init the host s2 page table with an invalid leaf with the owner ID
> to prevent the host from mapping the page on faults.
>
> Also, prevent kvm_pgtable_stage2_unmap() from removing owner ID from
> stage-2 PTEs, as this can be triggered from recycle logic under memory
> pressure. There is no code relying on this, as all ownership changes is
> done via kvm_pgtable_stage2_set_owner()
>
> For error path in IOMMU drivers, add a function to donate MMIO back
> from hyp to host.
>
> Signed-off-by: Mostafa Saleh <smostafa at google.com>
> ---
> arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 +
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 64 +++++++++++++++++++
> arch/arm64/kvm/hyp/pgtable.c | 9 +--
> 3 files changed, 68 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 52d7ee91e18c..98e173da0f9b 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -37,6 +37,8 @@ int __pkvm_host_share_hyp(u64 pfn);
> int __pkvm_host_unshare_hyp(u64 pfn);
> int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
> int ___pkvm_host_donate_hyp(u64 pfn, u64 nr_pages, enum kvm_pgtable_prot prot);
> +int __pkvm_host_donate_hyp_mmio(u64 pfn);
> +int __pkvm_hyp_donate_host_mmio(u64 pfn);
> int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
> int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
> int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 861e448183fd..c9a15ef6b18d 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -799,6 +799,70 @@ int ___pkvm_host_donate_hyp(u64 pfn, u64 nr_pages, enum kvm_pgtable_prot prot)
> return ret;
> }
>
> +int __pkvm_host_donate_hyp_mmio(u64 pfn)
> +{
> + u64 phys = hyp_pfn_to_phys(pfn);
> + void *virt = __hyp_va(phys);
> + int ret;
> + kvm_pte_t pte;
> +
> + host_lock_component();
> + hyp_lock_component();
> +
> + ret = kvm_pgtable_get_leaf(&host_mmu.pgt, phys, &pte, NULL);
> + if (ret)
> + goto unlock;
> +
> + if (pte && !kvm_pte_valid(pte)) {
> + ret = -EPERM;
> + goto unlock;
> + }
Shouldn't we first check that the pfn is indeed MMIO? Otherwise, testing
the pte for the ownership information isn't right.
> + ret = kvm_pgtable_get_leaf(&pkvm_pgtable, (u64)virt, &pte, NULL);
> + if (ret)
> + goto unlock;
> + if (pte) {
> + ret = -EBUSY;
> + goto unlock;
> + }
> +
> + ret = pkvm_create_mappings_locked(virt, virt + PAGE_SIZE, PAGE_HYP_DEVICE);
> + if (ret)
> + goto unlock;
> + /*
> + * We set HYP as the owner of the MMIO pages in the host stage-2, for:
> + * - host aborts: host_stage2_adjust_range() would fail for invalid non zero PTEs.
> + * - recycle under memory pressure: host_stage2_unmap_dev_all() would call
> + * kvm_pgtable_stage2_unmap() which will not clear non zero invalid ptes (counted).
> + * - other MMIO donation: Would fail as we check that the PTE is valid or empty.
> + */
> + WARN_ON(host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt, phys,
> + PAGE_SIZE, &host_s2_pool, PKVM_ID_HYP));
> +unlock:
> + hyp_unlock_component();
> + host_unlock_component();
> +
> + return ret;
> +}
> +
> +int __pkvm_hyp_donate_host_mmio(u64 pfn)
> +{
> + u64 phys = hyp_pfn_to_phys(pfn);
> + u64 virt = (u64)__hyp_va(phys);
> + size_t size = PAGE_SIZE;
> +
> + host_lock_component();
> + hyp_lock_component();
Shouldn't we check that:
1. pfn is mmio
2. pfn is owned by hyp
3. The host doesn't have something mapped at pfn already
?
> + WARN_ON(kvm_pgtable_hyp_unmap(&pkvm_pgtable, virt, size) != size);
> + WARN_ON(host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt, phys,
> + PAGE_SIZE, &host_s2_pool, PKVM_ID_HOST));
> + hyp_unlock_component();
> + host_unlock_component();
> +
> + return 0;
> +}
> +
> int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
> {
> return ___pkvm_host_donate_hyp(pfn, nr_pages, PAGE_HYP);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index c351b4abd5db..ba06b0c21d5a 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -1095,13 +1095,8 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> kvm_pte_t *childp = NULL;
> bool need_flush = false;
>
> - if (!kvm_pte_valid(ctx->old)) {
> - if (stage2_pte_is_counted(ctx->old)) {
> - kvm_clear_pte(ctx->ptep);
> - mm_ops->put_page(ctx->ptep);
> - }
> - return 0;
> - }
> + if (!kvm_pte_valid(ctx->old))
> + return stage2_pte_is_counted(ctx->old) ? -EPERM : 0;
Can this code be reached for the guest? For example, if
pkvm_pgtable_stage2_destroy() runs into an MMIO-guarded pte on teardown?
Will
More information about the linux-arm-kernel
mailing list