[PATCH v8 07/15] iommupt: Add map_pages op
Alexey Kardashevskiy
aik at amd.com
Wed Feb 25 15:11:56 PST 2026
On 18/1/26 02:43, Jason Gunthorpe wrote:
> On Sat, Jan 17, 2026 at 03:54:52PM +1100, Alexey Kardashevskiy wrote:
>
>> I am trying this with TEE-IO on AMD SEV and hitting problems.
>
> My understanding is that if you want to use SEV today you also have to
> use the kernel command line parameter to force 4k IOMMU pages?
>
> So, I think your questions are about trying to enhance this to get
> larger pages in the IOMMU when possible?
>
>> Now, from time to time the guest will share 4K pages which makes the
>> host OS smash NPT's 2MB PDEs to 4K PTEs, and 2M RMP entries to 4K
>> RMP entries, and since the IOMMU performs RMP checks - IOMMU PDEs
>> have to use the same granularity as NPT and RMP.
>
> IMHO this is a bad hardware choice, it is going to make some very
> troublesome software, so sigh.
>
>> So I end up in a situation when QEMU asks to map, for example, 2GB
>> of guest RAM and I want most of it to be 2MB mappings, and only
>> handful of 2MB pages to be split into 4K pages. But it appears so
>> that the above enforces the same page size for entire range.
>
>> In the old IOMMU code, I handled it like this:
>>
>> https://github.com/AMDESE/linux-kvm/commit/0a40130987b7b65c367390d23821cc4ecaeb94bd#diff-f22bea128ddb136c3adc56bc09de9822a53ba1ca60c8be662a48c3143c511963L341
>>
>> tl;dr: I constantly re-calculate the page size while mapping.
>
> Doing it at mapping time doesn't seem right to me, AFAICT the RMP can
> change dynamically whenever the guest decides to change the
> private/shared status of memory?
>
> My expectation for AMD was that the VMM would be monitoring the RMP
> granularity and use cut or "increase/decrease page size" through
> iommupt to adjust the S2 mapping so it works with these RMP
> limitations.
>
> Those don't fully exist yet, but they are in the plans.
>
> It assumes that the VMM is continually aware of what all the RMP PTEs
> look like and when they are changing so it can make the required
> adjustments.
>
> The flow would be some thing like..
> 1) Create an IOAS
> 2) Create a HWPT. If there is some known upper bound on RMP/etc page
> size then limit the HWPT page size to the upper bound
> 3) Map stuff into the ioas
> 4) Build the RMP/etc and map ranges of page granularity
> 5) Call iommufd to adjust the page size within ranges
I am about to try this approach now. 5) means splitting bigger pages to smaller and I remember you working on that hitless IO PDEs smashing, do you have something to play with? I could not spot anything on github but do not want to reinvent. Thanks,
> 6) Guest changes encrypted state so RMP changes
> 7) VMM adjusts the ranges of page granularity and calls iommufd with
> the updates
> 8) iommput code increases/decreases page size as required.
>
> Does this seem reasonable?
>
>> I know, ideally we would only share memory in 2MB chunks but we are
>> not there yet as I do not know the early boot stage on x86 enough to
>
> Even 2M is too small, I'd expect realy scenarios to want to get up to
> 1GB ??
>
> Jason
--
Alexey
More information about the linux-riscv
mailing list