[PATCH v6 08/25] KVM: arm64: iommu: Shadow host stage-2 page table

Jason Gunthorpe jgg at ziepe.ca
Mon May 11 07:22:32 PDT 2026


On Mon, May 11, 2026 at 11:24:14AM +0000, Mostafa Saleh wrote:
> On Sat, May 09, 2026 at 08:27:14PM -0300, Jason Gunthorpe wrote:
> > On Mon, May 04, 2026 at 12:28:55PM +0000, Mostafa Saleh wrote:
> > > So far this is the list of requirements/changes needed share the
> > > stage-2 page table (besides the obvious: same page table format,
> > > granularity, endianness...)
> > > 
> > > 1) HW BBM is not supported in the hypervisor page table, that’s
> > >    because it can generate TLB conflict aborts, which the hypervisor
> > >    can not handle because of the limited syndrome information.
> > >    We can rely on FEAT_BBML3 which was newly introduced to work
> > >    around that, it’s quite niche and not supported in KVM yet or
> > >    have an allow list similar to the kernel
> > >    (as in cpu_supports_bbml2_noabort()) which also limits the number
> > >    of CPUs that can run this.
> > 
> > Do you think pkvm will need BBM? Hitless replace of a PTE is already a
> > pretty advanced feature and the SMMU has its own support matrix there
> > too. Is it for shared/private conversion?
> 
> Yes, we can break block on memory donation which is transfer of
> ownership to the hypervisor or a guest.

So you need BBM support on the SMMU too? That is probably a big
problem because the SMMU is often mismatched to the CPU :\

Also io-pgtable arm cannot trigger BBM behaviors, so how do you
implement it?

> > No.. once you turn on IO like this you don't have page faults
> > anymore. Everything must be permantently mapped into the SMMU view, it
> > can never be made non-present and you must run without page
> > faults. That's what you have in the io-pgtable constructed table,
> > right?
> 
> Exactly, but the CPU page table doesn’t guarantee that, so we either
> have to handle page faults in the IOMMU, or completely change how KVM
> deals with stage-2 if we want to share the page table with the CPU.

So that's the real explanation, KVM cannot manage the S2 in the right
way so you can't share it. RMM/etc are managing the S2 without
pointless page faults so they can share it.

> > >    Alternatively, we can pin the stage-2 pages, that would require some
> > >    hypercalls, hacks to the driver/IOMMU API and possibly new semantics
> > >    in the DMA-API for IDENTITY devices as they will still need to pin
> > >    the pages as they are actually in stage-2 translation and not bypass.
> > 
> > ?? Then how does this series work?
> 
> This series works fine as it shadows the page table and doesn't share it
> with the CPU, so it fully populates the address space.

Which is why it is so weird that KVM is using a partially populated S2
when there is, and must, be a fully populated one for the SMMU. But I
understand there are reasons fo rthis.

Jason



More information about the linux-arm-kernel mailing list