[PATCH v8 07/15] iommupt: Add map_pages op

Tue Jan 27 00:08:39 PST 2026

On 24/1/26 01:14, Jason Gunthorpe wrote:
> On Fri, Jan 23, 2026 at 12:07:26PM +1100, Alexey Kardashevskiy wrote:
>>>> Got it. Interestingly the HW actually does that, almost. Say, for
>>>>> =2MB IO pages it checks if RMP==2MB and puts a 2MB IO TLB entry if
>>>> RMP==2MB, and for 4KB..1MB IO pages - a 4K IO TLB entry and RMP==4K
>>>> check. But it does not cross the 2MB boundary in RMP. Uff :-/
>>>
>>> Not sure I understand this limitation, how does any aligned size cross
>>> a 2MB boundary?
>>
>> Sorry, probably wrong wording. SNP allows a guest page to be backed
>> by only a 4K or 2M host page, IOMMU always rounds page size down to
>> the nearest 4K or 2M boundary. 4M IO pages can work with 2M RMP but
>> not 4K RMP.
> 
> Oh so it doesn't actually check the RMP, it is just rounding down to
> two fixed sizes?

No, it does check RMP.

If the IOMMU page walk ends at a >=2MB page - it will round down to 2MB (to the nearest supported RMP size) and check for 2MB RMP and if that check fails because of the page size - it won't try 4K (even though it could theoretically).

The expectation is that the host OS makes sure the IOMMU uses page sizes equal or bigger than closest smaller RMP page size so there is no need in two RMP checks.

> 
>>> ARM is pushing a thing where encrypt/decrypt has to work on certain aligned
>>> granual sizes > PAGE_SIZE, you could use that mechanism to select a 2M
>>> size for AMD too and avoid this.
>>
>> 2M minimum on every DMA map?
> On every swiotlb allocation pool chunk, yeah.

Nah, it is quite easy to force 2MB on swiotlb (just do it once and forget) but currently any guest page can be converted to shared and DMA-mapped and this skips swiotlb.

>>> Then why was I told the 4k page size kernel parameter fixes
>>> everything?
>>
>> Because IOMMU becomes 4K only and there is no huge page support in
>> the confidential KVM yet (well, in the upstream linux) so page size
>> mismatch cannot occur.
> 
> Ok, but you say when RMP has 2M pages then this doesn't work?

IOMMU pages size forced to 4K + 2M RMPs? Yup, does not work.

>>> What happens if the guest puts 4K pages into it's AMDv2 table and RMP
>>> is 2M?
>>
>> Is this AMDv2 - an NPT (then it is going to fail)? or nested IOMMU (never tried, in the works, I suspect failure)?
> 
> Yes, some future nested vIOMMU
> 
> If guest can't have a 4K page in it's vIOMMU while the host is using
> 2M RMP then the whole architecture is broken, sorry.

I re-read what I wrote and I think I was wrong, the S2 table (guest physical -> host physical) has to match RMP, not the S1.

>>>>>> If I get it right, for other platforms, the entire IOMMU table is
>>>>>> going to live in a secure space so there will be similar FW calls so
>>>>>> it is not that different.
>>>>>
>>>>> At least ARM the iommu S2 table is in secure memory and the secure FW
>>>>> keeps it 1:1 with the KVM S2 table. So edits to the KVM automatically
>>>>> make matching edits to the IOMMU. Only one software layer is
>>>>> responsible for things.
>>> ?
>>>> Does KVM talk to the host IOMMU code for that (and then the IOMMU code calls the secure world)?
>>>> Or KVM goes straight to that secure world?
>>>
>>> Straight to the secure world, there is no host IOMMU driver for the
>>> secure IOMMU.
>>
>> QEMU will try mapping all guest memory and will call the host for
>> this, or it won't, on ARM? No IOMMUFD in this case? Always
>> guest-visible IOMMU? Thanks,
> 
> iommufd won't deal with memory maps for IO, the secure world will
> handle that through KVM.

Is QEMU going to skip on IOMMU mapping entirely? So when the device is transitioned from untrusted (when everything mapped via VFIO or IOMMU) to trusted - QEMU will unmap everything and then the guest will map everything but this time via KVM and bypassing QEMU entirely? Thanks,

> The viommu and stuff is still optional and> would be controlled through iommufd.

-- 
Alexey