[PATCH v8 07/15] iommupt: Add map_pages op

Jason Gunthorpe jgg at nvidia.com
Tue Jan 27 06:25:12 PST 2026


On Tue, Jan 27, 2026 at 07:08:39PM +1100, Alexey Kardashevskiy wrote:
> > Oh so it doesn't actually check the RMP, it is just rounding down to
> > two fixed sizes?
> 
> No, it does check RMP.
> 
> If the IOMMU page walk ends at a >=2MB page - it will round down to
> 2MB (to the nearest supported RMP size) and check for 2MB RMP and if
> that check fails because of the page size - it won't try 4K (even
> though it could theoretically).
> 
> The expectation is that the host OS makes sure the IOMMU uses page
> sizes equal or bigger than closest smaller RMP page size so there is
> no need in two RMP checks.

Seems dynfunctional to me.

> > > > ARM is pushing a thing where encrypt/decrypt has to work on certain aligned
> > > > granual sizes > PAGE_SIZE, you could use that mechanism to select a 2M
> > > > size for AMD too and avoid this.
> > > 
> > > 2M minimum on every DMA map?
> > On every swiotlb allocation pool chunk, yeah.
> 
> Nah, it is quite easy to force 2MB on swiotlb (just do it once and
> forget) but currently any guest page can be converted to shared and
> DMA-mapped and this skips swiotlb.

Upstream Linux doesn't support that, only SWIOTLB or special DMA
coherent memory can be DMA mapped in CC systems. You can't take a
random page, make it shared and then DMA map it.

> > > > What happens if the guest puts 4K pages into it's AMDv2 table and RMP
> > > > is 2M?
> > > 
> > > Is this AMDv2 - an NPT (then it is going to fail)? or nested IOMMU (never tried, in the works, I suspect failure)?
> > 
> > Yes, some future nested vIOMMU
> > 
> > If guest can't have a 4K page in it's vIOMMU while the host is using
> > 2M RMP then the whole architecture is broken, sorry.
> 
> I re-read what I wrote and I think I was wrong, the S2 table (guest
> physical -> host physical) has to match RMP, not the S1.

Really? So the HW can fix the 4k/2M mismatch for the S1 but doesn't
bother for the S2? Seems like a crazy design to me.

What happens if you don't have a VIOMMU, have a single translation
stage and only use the S1 (AMDv2) page table in the hypervisor? Then
does the HW fix it? Or does it only fix it with two stages enabled?

> > iommufd won't deal with memory maps for IO, the secure world will
> > handle that through KVM.
> 
> Is QEMU going to skip on IOMMU mapping entirely? So when the device
> is transitioned from untrusted (when everything mapped via VFIO or
> IOMMU) to trusted - QEMU will unmap everything and then the guest
> will map everything but this time via KVM and bypassing QEMU
> entirely? Thanks,

On ARM there are different S2s for the IOMMU, one for T=1 and one for
T=0 traffic. The T=1 is fully controlled by the secure world is equal
to the CPU S2. The T=0 one is fully controlled by qemu and acts like a
normal system. The T=0 can only access guest shared memory.

Jason



More information about the linux-riscv mailing list