[PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table
Will Deacon
will at kernel.org
Tue Sep 9 07:42:07 PDT 2025
On Tue, Aug 19, 2025 at 09:51:38PM +0000, Mostafa Saleh wrote:
> Create a shadow page table for the IOMMU that shadows the
> host CPU stage-2 into the IOMMUs to establish DMA isolation.
>
> An initial snapshot is created after the driver init, then
> on every permission change a callback would be called for
> the IOMMU driver to update the page table.
>
> For some cases, an SMMUv3 may be able to share the same page
> table used with the host CPU stage-2 directly.
> However, this is too strict and requires changes to the core hypervisor
> page table code, plus it would require the hypervisor to handle IOMMU
> page faults. This can be added later as an optimization for SMMUV3.
>
> Signed-off-by: Mostafa Saleh <smostafa at google.com>
> ---
> arch/arm64/kvm/hyp/include/nvhe/iommu.h | 4 ++
> arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 83 ++++++++++++++++++++++++-
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 5 ++
> 3 files changed, 90 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
> index 1ac70cc28a9e..219363045b1c 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
> @@ -3,11 +3,15 @@
> #define __ARM64_KVM_NVHE_IOMMU_H__
>
> #include <asm/kvm_host.h>
> +#include <asm/kvm_pgtable.h>
>
> struct kvm_iommu_ops {
> int (*init)(void);
> + void (*host_stage2_idmap)(phys_addr_t start, phys_addr_t end, int prot);
> };
>
> int kvm_iommu_init(void);
>
> +void kvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
> + enum kvm_pgtable_prot prot);
> #endif /* __ARM64_KVM_NVHE_IOMMU_H__ */
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> index a01c036c55be..f7d1c8feb358 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> @@ -4,15 +4,94 @@
> *
> * Copyright (C) 2022 Linaro Ltd.
> */
> +#include <linux/iommu.h>
> +
> #include <nvhe/iommu.h>
> +#include <nvhe/mem_protect.h>
> +#include <nvhe/spinlock.h>
>
> /* Only one set of ops supported */
> struct kvm_iommu_ops *kvm_iommu_ops;
>
> +/* Protected by host_mmu.lock */
> +static bool kvm_idmap_initialized;
> +
> +static inline int pkvm_to_iommu_prot(enum kvm_pgtable_prot prot)
> +{
> + int iommu_prot = 0;
> +
> + if (prot & KVM_PGTABLE_PROT_R)
> + iommu_prot |= IOMMU_READ;
> + if (prot & KVM_PGTABLE_PROT_W)
> + iommu_prot |= IOMMU_WRITE;
> + if (prot == PKVM_HOST_MMIO_PROT)
> + iommu_prot |= IOMMU_MMIO;
This looks a little odd to me.
On the CPU side, the only different between PKVM_HOST_MEM_PROT and
PKVM_HOST_MMIO_PROT is that the former has execute permission. Both are
mapped as cacheable at stage-2 because it's the job of the host to set
the more restrictive memory type at stage-1.
Carrying that over to the SMMU would suggest that we don't care about
IOMMU_MMIO at stage-2 at all, so why do we need to set it here?
> + /* We don't understand that, might be dangerous. */
> + WARN_ON(prot & ~PKVM_HOST_MEM_PROT);
> + return iommu_prot;
> +}
> +
> +static int __snapshot_host_stage2(const struct kvm_pgtable_visit_ctx *ctx,
> + enum kvm_pgtable_walk_flags visit)
> +{
> + u64 start = ctx->addr;
> + kvm_pte_t pte = *ctx->ptep;
> + u32 level = ctx->level;
> + u64 end = start + kvm_granule_size(level);
> + int prot = IOMMU_READ | IOMMU_WRITE;
> +
> + /* Keep unmapped. */
> + if (pte && !kvm_pte_valid(pte))
> + return 0;
> +
> + if (kvm_pte_valid(pte))
> + prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte));
> + else if (!addr_is_memory(start))
> + prot |= IOMMU_MMIO;
Why do we need to map MMIO regions pro-actively here? I'd have thought
we could just do:
if (!kvm_pte_valid(pte))
return 0;
prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte);
kvm_iommu_ops->host_stage2_idmap(start, end, prot);
return 0;
but I think that IOMMU_MMIO is throwing me again...
Will
More information about the linux-arm-kernel
mailing list