[PATCH v14 29/44] arm64: RMI: Runtime faulting of memory

Steven Price steven.price at arm.com
Mon Jun 8 03:56:18 PDT 2026


On 08/06/2026 10:30, Suzuki K Poulose wrote:
> On 05/06/2026 07:23, Gavin Shan wrote:
>> Hi Steve,
>>
>> On 5/13/26 11:17 PM, Steven Price wrote:
>>> At runtime if the realm guest accesses memory which hasn't yet been
>>> mapped then KVM needs to either populate the region or fault the guest.
>>>
>>> For memory in the lower (protected) region of IPA a fresh page is
>>> provided to the RMM which will zero the contents. For memory in the
>>> upper (shared) region of IPA, the memory from the memslot is mapped
>>> into the realm VM non secure.
>>>
>>> Signed-off-by: Steven Price <steven.price at arm.com>
>>> ---
>>> Changes since v13:
>>>   * Numerous changes due to rebasing.
>>>   * Fix addr_range_desc() to encode the correct block size.
>>> Changes since v12:
>>>   * Switch to RMM v2.0 range based APIs.
>>> Changes since v11:
>>>   * Adapt to upstream changes.
>>> Changes since v10:
>>>   * RME->RMI renaming.
>>>   * Adapt to upstream gmem changes.
>>> Changes since v9:
>>>   * Fix call to kvm_stage2_unmap_range() in kvm_free_stage2_pgd() to set
>>>     may_block to avoid stall warnings.
>>>   * Minor coding style fixes.
>>> Changes since v8:
>>>   * Propagate the may_block flag.
>>>   * Minor comments and coding style changes.
>>> Changes since v7:
>>>   * Remove redundant WARN_ONs for realm_create_rtt_levels() - it will
>>>     internally WARN when necessary.
>>> Changes since v6:
>>>   * Handle PAGE_SIZE being larger than RMM granule size.
>>>   * Some minor renaming following review comments.
>>> Changes since v5:
>>>   * Reduce use of struct page in preparation for supporting the RMM
>>>     having a different page size to the host.
>>>   * Handle a race when delegating a page where another CPU has
>>> faulted on
>>>     a the same page (and already delegated the physical page) but not
>>> yet
>>>     mapped it. In this case simply return to the guest to either use the
>>>     mapping from the other CPU (or refault if the race is lost).
>>>   * The changes to populate_par_region() are moved into the previous
>>>     patch where they belong.
>>> Changes since v4:
>>>   * Code cleanup following review feedback.
>>>   * Drop the PTE_SHARED bit when creating unprotected page table
>>> entries.
>>>     This is now set by the RMM and the host has no control of it and the
>>>     spec requires the bit to be set to zero.
>>> Changes since v2:
>>>   * Avoid leaking memory if failing to map it in the realm.
>>>   * Correctly mask RTT based on LPA2 flag (see rtt_get_phys()).
>>>   * Adapt to changes in previous patches.
>>> ---
>>>   arch/arm64/include/asm/kvm_emulate.h |   8 ++
>>>   arch/arm64/include/asm/kvm_rmi.h     |  12 ++
>>>   arch/arm64/kvm/mmu.c                 | 128 ++++++++++++++++----
>>>   arch/arm64/kvm/rmi.c                 | 173 +++++++++++++++++++++++++++
>>>   4 files changed, 301 insertions(+), 20 deletions(-)
>>>

[...]

>>> diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
>>> index cae29fd3353c..761b38a4071c 100644
>>> --- a/arch/arm64/kvm/rmi.c
>>> +++ b/arch/arm64/kvm/rmi.c
>>> @@ -597,6 +597,179 @@ static int realm_data_map_init(struct kvm *kvm,
>>> unsigned long ipa,
>>>       return ret;
>>>   }
>>> +static unsigned long addr_range_desc(unsigned long phys, unsigned
>>> long size)
>>> +{
>>> +    unsigned long out = 0;
>>> +
>>> +    switch (size) {
>>> +    case P4D_SIZE:
>>> +        out = 3 | (1 << 2);
>>> +        break;
>>> +    case PUD_SIZE:
>>> +        out = 2 | (1 << 2);
>>> +        break;
>>> +    case PMD_SIZE:
>>> +        out = 1 | (1 << 2);
>>> +        break;
>>> +    case PAGE_SIZE:
>>> +        out = 0 | (1 << 2);
>>> +        break;
>>> +    default:
>>> +        /*
>>> +         * Only support mapping at the page level granulatity when
>>> +         * it's an unusual length. This should get us back onto a
>>> larger
>>> +         * block size for the subsequent mappings.
>>> +         */
>>> +        out = 0 | ((MIN(size >> PAGE_SHIFT, PTRS_PER_PTE - 1)) << 2);
>>> +        break;
>>> +    }
>>> +
>>> +    WARN_ON(phys & ~PAGE_MASK);
>>> +
>>> +    out |= phys & PAGE_MASK;
>>> +
>>> +    return out;
>>> +}
>>> +
>>> +int realm_map_protected(struct kvm *kvm,
>>> +            unsigned long ipa,
>>> +            kvm_pfn_t pfn,
>>> +            unsigned long map_size,
>>> +            struct kvm_mmu_memory_cache *memcache)
>>> +{
>>> +    struct realm *realm = &kvm->arch.realm;
>>> +    phys_addr_t phys = __pfn_to_phys(pfn);
>>> +    phys_addr_t base_phys = phys;
>>> +    phys_addr_t rd = virt_to_phys(realm->rd);
>>> +    unsigned long base_ipa = ipa;
>>> +    unsigned long ipa_top = ipa + map_size;
>>> +    int ret = 0;
>>> +
>>> +    if (WARN_ON(!IS_ALIGNED(map_size, PAGE_SIZE) ||
>>> +            !IS_ALIGNED(ipa, map_size)))
>>> +        return -EINVAL;
>>> +
>>> +    if (rmi_delegate_range(phys, map_size)) {
>>> +        /*
>>> +         * It's likely we raced with another VCPU on the same
>>> +         * fault. Assume the other VCPU has handled the fault
>>> +         * and return to the guest.
>>> +         */
>>> +        return 0;
>>> +    }
>>> +
>>> +    while (ipa < ipa_top) {
>>> +        unsigned long flags = RMI_ADDR_TYPE_SINGLE;
>>> +        unsigned long range_desc = addr_range_desc(phys, ipa_top -
>>> ipa);
>>> +        unsigned long out_top;
>>> +
>>> +        ret = rmi_rtt_data_map(rd, ipa, ipa_top, flags, range_desc,
>>> +                       &out_top);
>>> +
>>> +        if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
>>> +            /* Create missing RTTs and retry */
>>> +            int level = RMI_RETURN_INDEX(ret);
>>> +
>>> +            WARN_ON(level == KVM_PGTABLE_LAST_LEVEL);
>>> +            ret = realm_create_rtt_levels(realm, ipa, level,
>>> +                              KVM_PGTABLE_LAST_LEVEL,
>>> +                              memcache);
> 
> Could we give the RMM a chance to make use of the Block mappings by
> creating the Missing RTTs to the level that may work for the current
> range_desc ? i.e., if the range_desc is a 2M block size, we could create
> tables upto L2 in the first go and if the RMM still needs RTT, we could
> go further down to the KVM_PGTABLE_LAST_LEVEL. I understand this is
> kind of an optimisation, so may be we could defer it. (Same applies for
> the non_secure map below).

A simple change would be just to create one level at a time like this:

diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index b79b96f7dffb..3f3ade1d3895 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -767,15 +767,15 @@ static int realm_map_protected(struct kvm *kvm,
 			/* Create missing RTTs and retry */
 			int level = RMI_RETURN_INDEX(ret);
 
-			WARN_ON(level == KVM_PGTABLE_LAST_LEVEL);
+			if (WARN_ON(level >= KVM_PGTABLE_LAST_LEVEL))
+				goto err_undelegate;
 			ret = realm_create_rtt_levels(realm, ipa, level,
-						      KVM_PGTABLE_LAST_LEVEL,
+						      level + 1,
 						      memcache);
 			if (ret)
 				goto err_undelegate;
 
-			ret = rmi_rtt_data_map(rd, ipa, ipa_top, flags,
-					       range_desc, &out_top);
+			continue;
 		}
 
 		if (WARN_ON(ret))

Thanks,
Steve




More information about the linux-arm-kernel mailing list