[PATCH v8 07/15] iommupt: Add map_pages op
Jason Gunthorpe
jgg at nvidia.com
Tue Jan 27 06:25:12 PST 2026
On Tue, Jan 27, 2026 at 07:08:39PM +1100, Alexey Kardashevskiy wrote:
> > Oh so it doesn't actually check the RMP, it is just rounding down to
> > two fixed sizes?
>
> No, it does check RMP.
>
> If the IOMMU page walk ends at a >=2MB page - it will round down to
> 2MB (to the nearest supported RMP size) and check for 2MB RMP and if
> that check fails because of the page size - it won't try 4K (even
> though it could theoretically).
>
> The expectation is that the host OS makes sure the IOMMU uses page
> sizes equal or bigger than closest smaller RMP page size so there is
> no need in two RMP checks.
Seems dynfunctional to me.
> > > > ARM is pushing a thing where encrypt/decrypt has to work on certain aligned
> > > > granual sizes > PAGE_SIZE, you could use that mechanism to select a 2M
> > > > size for AMD too and avoid this.
> > >
> > > 2M minimum on every DMA map?
> > On every swiotlb allocation pool chunk, yeah.
>
> Nah, it is quite easy to force 2MB on swiotlb (just do it once and
> forget) but currently any guest page can be converted to shared and
> DMA-mapped and this skips swiotlb.
Upstream Linux doesn't support that, only SWIOTLB or special DMA
coherent memory can be DMA mapped in CC systems. You can't take a
random page, make it shared and then DMA map it.
> > > > What happens if the guest puts 4K pages into it's AMDv2 table and RMP
> > > > is 2M?
> > >
> > > Is this AMDv2 - an NPT (then it is going to fail)? or nested IOMMU (never tried, in the works, I suspect failure)?
> >
> > Yes, some future nested vIOMMU
> >
> > If guest can't have a 4K page in it's vIOMMU while the host is using
> > 2M RMP then the whole architecture is broken, sorry.
>
> I re-read what I wrote and I think I was wrong, the S2 table (guest
> physical -> host physical) has to match RMP, not the S1.
Really? So the HW can fix the 4k/2M mismatch for the S1 but doesn't
bother for the S2? Seems like a crazy design to me.
What happens if you don't have a VIOMMU, have a single translation
stage and only use the S1 (AMDv2) page table in the hypervisor? Then
does the HW fix it? Or does it only fix it with two stages enabled?
> > iommufd won't deal with memory maps for IO, the secure world will
> > handle that through KVM.
>
> Is QEMU going to skip on IOMMU mapping entirely? So when the device
> is transitioned from untrusted (when everything mapped via VFIO or
> IOMMU) to trusted - QEMU will unmap everything and then the guest
> will map everything but this time via KVM and bypassing QEMU
> entirely? Thanks,
On ARM there are different S2s for the IOMMU, one for T=1 and one for
T=0 traffic. The T=1 is fully controlled by the secure world is equal
to the CPU S2. The T=0 one is fully controlled by qemu and acts like a
normal system. The T=0 can only access guest shared memory.
Jason
More information about the linux-riscv
mailing list