[PATCH v14 29/44] arm64: RMI: Runtime faulting of memory

Suzuki K Poulose suzuki.poulose at arm.com
Mon Jun 8 05:58:23 PDT 2026


On 08/06/2026 11:56, Steven Price wrote:
> On 08/06/2026 10:30, Suzuki K Poulose wrote:
>> On 05/06/2026 07:23, Gavin Shan wrote:
>>> Hi Steve,
>>>
>>> On 5/13/26 11:17 PM, Steven Price wrote:
>>>> At runtime if the realm guest accesses memory which hasn't yet been
>>>> mapped then KVM needs to either populate the region or fault the guest.
>>>>
>>>> For memory in the lower (protected) region of IPA a fresh page is
>>>> provided to the RMM which will zero the contents. For memory in the
>>>> upper (shared) region of IPA, the memory from the memslot is mapped
>>>> into the realm VM non secure.
>>>>
>>>> Signed-off-by: Steven Price <steven.price at arm.com>
>>>> ---
>>>> Changes since v13:
>>>>    * Numerous changes due to rebasing.
>>>>    * Fix addr_range_desc() to encode the correct block size.
>>>> Changes since v12:
>>>>    * Switch to RMM v2.0 range based APIs.
>>>> Changes since v11:
>>>>    * Adapt to upstream changes.
>>>> Changes since v10:
>>>>    * RME->RMI renaming.
>>>>    * Adapt to upstream gmem changes.
>>>> Changes since v9:
>>>>    * Fix call to kvm_stage2_unmap_range() in kvm_free_stage2_pgd() to set
>>>>      may_block to avoid stall warnings.
>>>>    * Minor coding style fixes.
>>>> Changes since v8:
>>>>    * Propagate the may_block flag.
>>>>    * Minor comments and coding style changes.
>>>> Changes since v7:
>>>>    * Remove redundant WARN_ONs for realm_create_rtt_levels() - it will
>>>>      internally WARN when necessary.
>>>> Changes since v6:
>>>>    * Handle PAGE_SIZE being larger than RMM granule size.
>>>>    * Some minor renaming following review comments.
>>>> Changes since v5:
>>>>    * Reduce use of struct page in preparation for supporting the RMM
>>>>      having a different page size to the host.
>>>>    * Handle a race when delegating a page where another CPU has
>>>> faulted on
>>>>      a the same page (and already delegated the physical page) but not
>>>> yet
>>>>      mapped it. In this case simply return to the guest to either use the
>>>>      mapping from the other CPU (or refault if the race is lost).
>>>>    * The changes to populate_par_region() are moved into the previous
>>>>      patch where they belong.
>>>> Changes since v4:
>>>>    * Code cleanup following review feedback.
>>>>    * Drop the PTE_SHARED bit when creating unprotected page table
>>>> entries.
>>>>      This is now set by the RMM and the host has no control of it and the
>>>>      spec requires the bit to be set to zero.
>>>> Changes since v2:
>>>>    * Avoid leaking memory if failing to map it in the realm.
>>>>    * Correctly mask RTT based on LPA2 flag (see rtt_get_phys()).
>>>>    * Adapt to changes in previous patches.
>>>> ---
>>>>    arch/arm64/include/asm/kvm_emulate.h |   8 ++
>>>>    arch/arm64/include/asm/kvm_rmi.h     |  12 ++
>>>>    arch/arm64/kvm/mmu.c                 | 128 ++++++++++++++++----
>>>>    arch/arm64/kvm/rmi.c                 | 173 +++++++++++++++++++++++++++
>>>>    4 files changed, 301 insertions(+), 20 deletions(-)
>>>>
> 
> [...]
> 
>>>> diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
>>>> index cae29fd3353c..761b38a4071c 100644
>>>> --- a/arch/arm64/kvm/rmi.c
>>>> +++ b/arch/arm64/kvm/rmi.c
>>>> @@ -597,6 +597,179 @@ static int realm_data_map_init(struct kvm *kvm,
>>>> unsigned long ipa,
>>>>        return ret;
>>>>    }
>>>> +static unsigned long addr_range_desc(unsigned long phys, unsigned
>>>> long size)
>>>> +{
>>>> +    unsigned long out = 0;
>>>> +
>>>> +    switch (size) {
>>>> +    case P4D_SIZE:
>>>> +        out = 3 | (1 << 2);
>>>> +        break;
>>>> +    case PUD_SIZE:
>>>> +        out = 2 | (1 << 2);
>>>> +        break;
>>>> +    case PMD_SIZE:
>>>> +        out = 1 | (1 << 2);
>>>> +        break;
>>>> +    case PAGE_SIZE:
>>>> +        out = 0 | (1 << 2);
>>>> +        break;
>>>> +    default:
>>>> +        /*
>>>> +         * Only support mapping at the page level granulatity when
>>>> +         * it's an unusual length. This should get us back onto a
>>>> larger
>>>> +         * block size for the subsequent mappings.
>>>> +         */
>>>> +        out = 0 | ((MIN(size >> PAGE_SHIFT, PTRS_PER_PTE - 1)) << 2);
>>>> +        break;
>>>> +    }
>>>> +
>>>> +    WARN_ON(phys & ~PAGE_MASK);
>>>> +
>>>> +    out |= phys & PAGE_MASK;
>>>> +
>>>> +    return out;
>>>> +}
>>>> +
>>>> +int realm_map_protected(struct kvm *kvm,
>>>> +            unsigned long ipa,
>>>> +            kvm_pfn_t pfn,
>>>> +            unsigned long map_size,
>>>> +            struct kvm_mmu_memory_cache *memcache)
>>>> +{
>>>> +    struct realm *realm = &kvm->arch.realm;
>>>> +    phys_addr_t phys = __pfn_to_phys(pfn);
>>>> +    phys_addr_t base_phys = phys;
>>>> +    phys_addr_t rd = virt_to_phys(realm->rd);
>>>> +    unsigned long base_ipa = ipa;
>>>> +    unsigned long ipa_top = ipa + map_size;
>>>> +    int ret = 0;
>>>> +
>>>> +    if (WARN_ON(!IS_ALIGNED(map_size, PAGE_SIZE) ||
>>>> +            !IS_ALIGNED(ipa, map_size)))
>>>> +        return -EINVAL;
>>>> +
>>>> +    if (rmi_delegate_range(phys, map_size)) {
>>>> +        /*
>>>> +         * It's likely we raced with another VCPU on the same
>>>> +         * fault. Assume the other VCPU has handled the fault
>>>> +         * and return to the guest.
>>>> +         */
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    while (ipa < ipa_top) {
>>>> +        unsigned long flags = RMI_ADDR_TYPE_SINGLE;
>>>> +        unsigned long range_desc = addr_range_desc(phys, ipa_top -
>>>> ipa);
>>>> +        unsigned long out_top;
>>>> +
>>>> +        ret = rmi_rtt_data_map(rd, ipa, ipa_top, flags, range_desc,
>>>> +                       &out_top);
>>>> +
>>>> +        if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
>>>> +            /* Create missing RTTs and retry */
>>>> +            int level = RMI_RETURN_INDEX(ret);
>>>> +
>>>> +            WARN_ON(level == KVM_PGTABLE_LAST_LEVEL);
>>>> +            ret = realm_create_rtt_levels(realm, ipa, level,
>>>> +                              KVM_PGTABLE_LAST_LEVEL,
>>>> +                              memcache);
>>
>> Could we give the RMM a chance to make use of the Block mappings by
>> creating the Missing RTTs to the level that may work for the current
>> range_desc ? i.e., if the range_desc is a 2M block size, we could create
>> tables upto L2 in the first go and if the RMM still needs RTT, we could
>> go further down to the KVM_PGTABLE_LAST_LEVEL. I understand this is
>> kind of an optimisation, so may be we could defer it. (Same applies for
>> the non_secure map below).
> 
> A simple change would be just to create one level at a time like this:
> 
> diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
> index b79b96f7dffb..3f3ade1d3895 100644
> --- a/arch/arm64/kvm/rmi.c
> +++ b/arch/arm64/kvm/rmi.c
> @@ -767,15 +767,15 @@ static int realm_map_protected(struct kvm *kvm,
>   			/* Create missing RTTs and retry */
>   			int level = RMI_RETURN_INDEX(ret);
>   
> -			WARN_ON(level == KVM_PGTABLE_LAST_LEVEL);
> +			if (WARN_ON(level >= KVM_PGTABLE_LAST_LEVEL))
> +				goto err_undelegate;
>   			ret = realm_create_rtt_levels(realm, ipa, level,
> -						      KVM_PGTABLE_LAST_LEVEL,
> +						      level + 1,
>   						      memcache);
>   			if (ret)
>   				goto err_undelegate;
>   
> -			ret = rmi_rtt_data_map(rd, ipa, ipa_top, flags,
> -					       range_desc, &out_top);
> +			continue;
>   		}

That looks good to me.

Cheers
Suzuki


>   
>   		if (WARN_ON(ret))
> 
> Thanks,
> Steve
> 




More information about the linux-arm-kernel mailing list