[PATCH v8 07/15] iommupt: Add map_pages op
Alexey Kardashevskiy
aik at amd.com
Thu Jan 22 17:07:26 PST 2026
On 23/1/26 01:12, Jason Gunthorpe wrote:
> On Thu, Jan 22, 2026 at 09:58:04PM +1100, Alexey Kardashevskiy wrote:
>>> This issue with the RMP is no different, if you get a 2M IOPTE then
>>> the HW should check the RMP and load in a 4K IOPTE to the IOTLB if
>>> that is what the RMP requires.
>>> That the HW doesn't do that means you have all these difficult
>>> problems.
>>
>> Got it. Interestingly the HW actually does that, almost. Say, for
>>> =2MB IO pages it checks if RMP==2MB and puts a 2MB IO TLB entry if
>> RMP==2MB, and for 4KB..1MB IO pages - a 4K IO TLB entry and RMP==4K
>> check. But it does not cross the 2MB boundary in RMP. Uff :-/
>
> Not sure I understand this limitation, how does any aligned size cross
> a 2MB boundary?
Sorry, probably wrong wording. SNP allows a guest page to be backed by only a 4K or 2M host page, IOMMU always rounds page size down to the nearest 4K or 2M boundary. 4M IO pages can work with 2M RMP but not 4K RMP.
> Sounds like it was thought about, is it a HW bug some cases don't
> work?
Nah, this is intentional, I just do not understand all consequences of allowing 4K RMP to work with 8MB IO page :)
>> on the other hand, without swiotlb, dma_map() in the guest for
>> untrusted device is likely to be lot less than 2MB and going to
>> share another handful of pages but this activity is not that rare
>> compared to my certificates example. If only there was a way to
>> somehow bundle such allocations/mappings... :-/
>
> ARM is pushing a thing where encrypt/decrypt has to work on certain aligned
> granual sizes > PAGE_SIZE, you could use that mechanism to select a 2M
> size for AMD too and avoid this.
2M minimum on every DMA map?
>>> That's a completely grotesque solution!
>>>
>>> It violates all of our software layers. The IOMMU and RMP are not
>>> controled by the same software entity and you propose to have a FW
>>> call that edits *both* together somehow? How is that even going to
>>> work safely?
>>>
>>> Can't you do things in a sequence?
>>>
>>> Change the iommu from 2M to 4K, flush, then change the RMP from 2M to
>>> 4K?
>>
>> Sure we could unless there is ongoing DMA between "flush" and "then
>> change" and then DMA will fail because of mismatching page sizes
>> (that 2MB crossing thing above).
>
> I'm confused, if the IOMMU has 4K and the RMP has 2M it doesn't work?
I have not tried this, IOMMU pages are usually the biggest on AMD platform, often 8MB.
> Then why was I told the 4k page size kernel parameter fixes
> everything?
Because IOMMU becomes 4K only and there is no huge page support in the confidential KVM yet (well, in the upstream linux) so page size mismatch cannot occur.
> What happens if the guest puts 4K pages into it's AMDv2 table and RMP
> is 2M?
Is this AMDv2 - an NPT (then it is going to fail)? or nested IOMMU (never tried, in the works, I suspect failure)?
>>>> If I get it right, for other platforms, the entire IOMMU table is
>>>> going to live in a secure space so there will be similar FW calls so
>>>> it is not that different.
>>>
>>> At least ARM the iommu S2 table is in secure memory and the secure FW
>>> keeps it 1:1 with the KVM S2 table. So edits to the KVM automatically
>>> make matching edits to the IOMMU. Only one software layer is
>>> responsible for things.
> ?
>> Does KVM talk to the host IOMMU code for that (and then the IOMMU code calls the secure world)?
>> Or KVM goes straight to that secure world?
>
> Straight to the secure world, there is no host IOMMU driver for the
> secure IOMMU.
QEMU will try mapping all guest memory and will call the host for this, or it won't, on ARM? No IOMMUFD in this case? Always guest-visible IOMMU? Thanks,
>> Is the host IOMMU code aware of the content of the secure IOMMU table?
>
> No, it isn't even aware it exist.
>
>> Does 2MB->4K smashing exist on ARM at all?
>
> Every arch has cases where larger mappings need to be reduced to
> smaller ones, but ARM doesn't require synchronized coordination
> between multiple tables.
>
> Jason
--
Alexey
More information about the linux-riscv
mailing list