[PATCH v6 08/25] KVM: arm64: iommu: Shadow host stage-2 page table

Mostafa Saleh smostafa at google.com
Mon May 11 04:24:14 PDT 2026


On Sat, May 09, 2026 at 08:27:14PM -0300, Jason Gunthorpe wrote:
> On Mon, May 04, 2026 at 12:28:55PM +0000, Mostafa Saleh wrote:
> > So far this is the list of requirements/changes needed share the
> > stage-2 page table (besides the obvious: same page table format,
> > granularity, endianness...)
> > 
> > 1) HW BBM is not supported in the hypervisor page table, that’s
> >    because it can generate TLB conflict aborts, which the hypervisor
> >    can not handle because of the limited syndrome information.
> >    We can rely on FEAT_BBML3 which was newly introduced to work
> >    around that, it’s quite niche and not supported in KVM yet or
> >    have an allow list similar to the kernel
> >    (as in cpu_supports_bbml2_noabort()) which also limits the number
> >    of CPUs that can run this.
> 
> Do you think pkvm will need BBM? Hitless replace of a PTE is already a
> pretty advanced feature and the SMMU has its own support matrix there
> too. Is it for shared/private conversion?

Yes, we can break block on memory donation which is transfer of
ownership to the hypervisor or a guest.

> 
> > 2) Handling page faults, devices must be able to stall and let the
> >    hypervisor handle the page fault (which has to proxy through the
> >    kernel as the hypervisor doesn’t handle interrupts), this includes
> >    also IO page faults which are hard to get right from the HW which
> >    and may lead to system stability issues or lockups.
> 
> No.. once you turn on IO like this you don't have page faults
> anymore. Everything must be permantently mapped into the SMMU view, it
> can never be made non-present and you must run without page
> faults. That's what you have in the io-pgtable constructed table,
> right?

Exactly, but the CPU page table doesn’t guarantee that, so we either
have to handle page faults in the IOMMU, or completely change how KVM
deals with stage-2 if we want to share the page table with the CPU.

> 
> >    Alternatively, we can pin the stage-2 pages, that would require some
> >    hypercalls, hacks to the driver/IOMMU API and possibly new semantics
> >    in the DMA-API for IDENTITY devices as they will still need to pin
> >    the pages as they are actually in stage-2 translation and not bypass.
> 
> ?? Then how does this series work?

This series works fine as it shadows the page table and doesn't share it
with the CPU, so it fully populates the address space.

> 
> > 3) SMMUv3 must be coherent.
> 
> Yes for sure.
> 
> > 4) Support BTM/DVM for TLB invalidation, otherwise some hooks are
> >    still required (although not io-pgtable-arm)
> 
> SW needs to forward invalidations, BTM is rare..
> 
> > IMO, 1, 2 are the most tricky parts. It's more work and runs on very
> > limited systems, However, it can be implemented as an optimization)
> > which is my plan.
> 
> I think unless you can do it without these HW features (excluding 3)
> don't bother.

I am looking into this now, but as I mentioned that will be a separate
RFC following this one as an optimization for advanced HW.

Thanks,
Mostafa

> 
> > I am not sure how CCA deals with that, I’d expect they have a lot of
> > constraints on CPUs/SMMUs and DMA capable devices on those systems.
> 
> 3 is not supported. The entire S2 is permanently mapped and doesn't
> really change alot at runtime. No page faults, not sure if the RMM
> private/shard conversion would require BMM..
> 
> Jason



More information about the linux-arm-kernel mailing list