[PATCH v6 08/25] KVM: arm64: iommu: Shadow host stage-2 page table
Mostafa Saleh
smostafa at google.com
Mon May 11 04:24:14 PDT 2026
On Sat, May 09, 2026 at 08:27:14PM -0300, Jason Gunthorpe wrote:
> On Mon, May 04, 2026 at 12:28:55PM +0000, Mostafa Saleh wrote:
> > So far this is the list of requirements/changes needed share the
> > stage-2 page table (besides the obvious: same page table format,
> > granularity, endianness...)
> >
> > 1) HW BBM is not supported in the hypervisor page table, that’s
> > because it can generate TLB conflict aborts, which the hypervisor
> > can not handle because of the limited syndrome information.
> > We can rely on FEAT_BBML3 which was newly introduced to work
> > around that, it’s quite niche and not supported in KVM yet or
> > have an allow list similar to the kernel
> > (as in cpu_supports_bbml2_noabort()) which also limits the number
> > of CPUs that can run this.
>
> Do you think pkvm will need BBM? Hitless replace of a PTE is already a
> pretty advanced feature and the SMMU has its own support matrix there
> too. Is it for shared/private conversion?
Yes, we can break block on memory donation which is transfer of
ownership to the hypervisor or a guest.
>
> > 2) Handling page faults, devices must be able to stall and let the
> > hypervisor handle the page fault (which has to proxy through the
> > kernel as the hypervisor doesn’t handle interrupts), this includes
> > also IO page faults which are hard to get right from the HW which
> > and may lead to system stability issues or lockups.
>
> No.. once you turn on IO like this you don't have page faults
> anymore. Everything must be permantently mapped into the SMMU view, it
> can never be made non-present and you must run without page
> faults. That's what you have in the io-pgtable constructed table,
> right?
Exactly, but the CPU page table doesn’t guarantee that, so we either
have to handle page faults in the IOMMU, or completely change how KVM
deals with stage-2 if we want to share the page table with the CPU.
>
> > Alternatively, we can pin the stage-2 pages, that would require some
> > hypercalls, hacks to the driver/IOMMU API and possibly new semantics
> > in the DMA-API for IDENTITY devices as they will still need to pin
> > the pages as they are actually in stage-2 translation and not bypass.
>
> ?? Then how does this series work?
This series works fine as it shadows the page table and doesn't share it
with the CPU, so it fully populates the address space.
>
> > 3) SMMUv3 must be coherent.
>
> Yes for sure.
>
> > 4) Support BTM/DVM for TLB invalidation, otherwise some hooks are
> > still required (although not io-pgtable-arm)
>
> SW needs to forward invalidations, BTM is rare..
>
> > IMO, 1, 2 are the most tricky parts. It's more work and runs on very
> > limited systems, However, it can be implemented as an optimization)
> > which is my plan.
>
> I think unless you can do it without these HW features (excluding 3)
> don't bother.
I am looking into this now, but as I mentioned that will be a separate
RFC following this one as an optimization for advanced HW.
Thanks,
Mostafa
>
> > I am not sure how CCA deals with that, I’d expect they have a lot of
> > constraints on CPUs/SMMUs and DMA capable devices on those systems.
>
> 3 is not supported. The entire S2 is permanently mapped and doesn't
> really change alot at runtime. No page faults, not sure if the RMM
> private/shard conversion would require BMM..
>
> Jason
More information about the linux-arm-kernel
mailing list