[PATCH v14 10/44] arm64: RMI: Add support for SRO

Mon Jun 15 04:45:27 PDT 2026

Hi Dan,

On 13/06/2026 00:07, Dan Williams (nvidia) wrote:
> Steven Price wrote:
> [..]
>>> alloc_pages_exact() will fail if the requested size exceeds the maximal
>>> allowed
>>> size (1 << MAX_PAGE_ORDER). The maximal size is usually smaller than
>>> PUD_SIZE
>>> but PUD_SIZE is allowed by the RMM.
>>
>> This is an area where to be honest I'm really not sure what to do.
>> Technically the RMM is allowed to ask for a contiguous range of 512GB
>> pages (on a 4K system - larger with larger page sizes) - but clearly no
>> real OS is going to be able to provide anything like that.
>>
>> In practise we don't expect the RMM to do anything so crazy. It's not
>> really clear to be whether even 2MB (PMD_SIZE) is needed. But the spec
>> is written to be generic.
>>
>> So my current approach is to calculate the required size and pass it
>> into alloc_pages_exact(). For "stupidly large" values this will fail and
>> Linux just doesn't support an RMM which attempts this. If there is ever
>> a usecase which needs this then we'd need to find a different method of
>> providing the memory (most likely some form of carveout to avoid
>> fragmentation). But my view is we should wait for that usecase to be
>> identified first.
> 
> Just some comparison comments as I am also going through the TDX patches
> which enable "Extension SEAMCALLs". These new SEAMCALLs are similar to
> the SRO mechanism [1].

Looks like at least at the moment it's much more one-way than the SRO
mechanism - there's no reclaim mechanism (yet).

> TDX asks for an upfront delegation of memory at init time using
> alloc_contig_pages() that is never returned until entire module is
> shutdown. alloc_contig_pages() is not subject to the MAX_ORDER limit,
> but not sure that alloc_contig_pages() is suitable for small+dynamic
> runtime memory add / release that SRO potentially wants to do?

Yeah I'm not sure quite what is best. I expect the RMM to only request
contiguous memory for very small allocations to use as hardware page
tables. It's an issue I'm trying to work through that the specification
doesn't provide any guidance for what sort of allocations the host
should expect to provide.

> Does SRO always balance the size of RMI_OP_MEM_REQ_DONATE with
> RMI_OP_MEM_REQ_RECLAIM, or might some donate requests be a one way
> donation like TDX? Just poking to see if there is a path to preallocate
> a pool vs the fine grained per-operation alloc/free.

The spec is unfortunately not prescriptive on this point. For an
operation which eventually fails, the expectation is that the RMM will
return all the memory that was provided (and exactly that memory). But
the specification doesn't actually require that.

The problem is that there are situations where a racing operation on
another CPU could trigger this to not happen. For example, a new page
table needs to be allocated to complete a map operation, but then a
racing operation on another CPU makes use of this page table (e.g due to
a map at a different address), the memory for the page table cannot be
returned even if the operation doesn't complete because it's in use from
the racing operation.

I don't believe the current RMM design will actually do this - but it's
not something we actually want to prevent in the spec.

Equally the expectation is that all the donated memory for a guest will
be returned when the guest is destroyed. But we don't have anything in
the spec to enforce this.

I don't particularly expect a pool to be that useful for the expected
memory allocation patterns as I expect SRO donations to be long lived.
We don't (yet at least) have a concept of donating memory just for
"scratch" memory during an operation. Although the SRO mechanism doesn't
rule that out.

Thanks,
Steve