[PATCH v7 18/45] arm64: RME: Handle RMI_EXIT_RIPAS_CHANGE
Gavin Shan
gshan at redhat.com
Tue Apr 8 17:13:40 PDT 2025
Hi Steve,
On 4/8/25 2:34 AM, Steven Price wrote:
> On 04/03/2025 04:35, Gavin Shan wrote:
>> On 2/14/25 2:13 AM, Steven Price wrote:
>>> The guest can request that a region of it's protected address space is
>>> switched between RIPAS_RAM and RIPAS_EMPTY (and back) using
>>> RSI_IPA_STATE_SET. This causes a guest exit with the
>>> RMI_EXIT_RIPAS_CHANGE code. We treat this as a request to convert a
>>> protected region to unprotected (or back), exiting to the VMM to make
>>> the necessary changes to the guest_memfd and memslot mappings. On the
>>> next entry the RIPAS changes are committed by making RMI_RTT_SET_RIPAS
>>> calls.
>>>
>>> The VMM may wish to reject the RIPAS change requested by the guest. For
>>> now it can only do with by no longer scheduling the VCPU as we don't
>>> currently have a usecase for returning that rejection to the guest, but
>>> by postponing the RMI_RTT_SET_RIPAS changes to entry we leave the door
>>> open for adding a new ioctl in the future for this purpose.
>>>
>>> Signed-off-by: Steven Price <steven.price at arm.com>
>>> ---
>>> New patch for v7: The code was previously split awkwardly between two
>>> other patches.
>>> ---
>>> arch/arm64/kvm/rme.c | 87 ++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 87 insertions(+)
>>>
>>
>> With the following comments addressed:
>>
>> Reviewed-by: Gavin Shan <gshan at redhat.com>
>>
>>> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
>>> index 507eb4b71bb7..f965869e9ef7 100644
>>> --- a/arch/arm64/kvm/rme.c
>>> +++ b/arch/arm64/kvm/rme.c
>>> @@ -624,6 +624,64 @@ void kvm_realm_unmap_range(struct kvm *kvm,
>>> unsigned long start, u64 size,
>>> realm_unmap_private_range(kvm, start, end);
>>> }
>>> +static int realm_set_ipa_state(struct kvm_vcpu *vcpu,
>>> + unsigned long start,
>>> + unsigned long end,
>>> + unsigned long ripas,
>>> + unsigned long *top_ipa)
>>> +{
>>> + struct kvm *kvm = vcpu->kvm;
>>> + struct realm *realm = &kvm->arch.realm;
>>> + struct realm_rec *rec = &vcpu->arch.rec;
>>> + phys_addr_t rd_phys = virt_to_phys(realm->rd);
>>> + phys_addr_t rec_phys = virt_to_phys(rec->rec_page);
>>> + struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>>> + unsigned long ipa = start;
>>> + int ret = 0;
>>> +
>>> + while (ipa < end) {
>>> + unsigned long next;
>>> +
>>> + ret = rmi_rtt_set_ripas(rd_phys, rec_phys, ipa, end, &next);
>>> +
>>
>> This doesn't look correct to me. Looking at RMM::smc_rtt_set_ripas(),
>> it's possible
>> the SMC call is returned without updating 'next' to a valid address. In
>> this case,
>> the garbage content resident in 'next' can be used to updated to 'ipa'
>> in next
>> iternation. So we need to initialize it in advance, like below.
>>
>> unsigned long ipa = start;
>> unsigned long next = start;
>>
>> while (ipa < end) {
>> ret = rmi_rtt_set_ripas(rd_phys, rec_phys, ipa, end, &next);
>
> I agree this might not be the clearest code, but 'next' should be set if
> the return state is RMI_SUCCESS, and we don't actually get to the "ipa =
> next" line unless that is the case. But I'll rejig things because it's
> not clear.
>
Yes, 'next' is always updated when RMI_SUCCESS is returned. However, 'next'
won't be updated when RMI_ERROR_RTT is returned. I've overlooked the code,
when RMI_ERROR_RTT is returned for the first iteration, 'ipa' is kept as
intact and the 'ipa' is retried after stage2 page-table is populated. So
everything should be fine.
>>> + if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
>>> + int walk_level = RMI_RETURN_INDEX(ret);
>>> + int level = find_map_level(realm, ipa, end);
>>> +
>>> + /*
>>> + * If the RMM walk ended early then more tables are
>>> + * needed to reach the required depth to set the RIPAS.
>>> + */
>>> + if (walk_level < level) {
>>> + ret = realm_create_rtt_levels(realm, ipa,
>>> + walk_level,
>>> + level,
>>> + memcache);
>>> + /* Retry with RTTs created */
>>> + if (!ret)
>>> + continue;
>>> + } else {
>>> + ret = -EINVAL;
>>> + }
>>> +
>>> + break;
>>> + } else if (RMI_RETURN_STATUS(ret) != RMI_SUCCESS) {
>>> + WARN(1, "Unexpected error in %s: %#x\n", __func__,
>>> + ret);
>>> + ret = -EINVAL;
>>
>> ret = -ENXIO;
>
> Ack
>
>>> + break;
>>> + }
>>> + ipa = next;
>>> + }
>>> +
>>> + *top_ipa = ipa;
>>> +
>>> + if (ripas == RMI_EMPTY && ipa != start)
>>> + realm_unmap_private_range(kvm, start, ipa);
>>> +
>>> + return ret;
>>> +}
>>> +
>>> static int realm_init_ipa_state(struct realm *realm,
>>> unsigned long ipa,
>>> unsigned long end)
>>> @@ -863,6 +921,32 @@ void kvm_destroy_realm(struct kvm *kvm)
>>> kvm_free_stage2_pgd(&kvm->arch.mmu);
>>> }
>>> +static void kvm_complete_ripas_change(struct kvm_vcpu *vcpu)
>>> +{
>>> + struct kvm *kvm = vcpu->kvm;
>>> + struct realm_rec *rec = &vcpu->arch.rec;
>>> + unsigned long base = rec->run->exit.ripas_base;
>>> + unsigned long top = rec->run->exit.ripas_top;
>>> + unsigned long ripas = rec->run->exit.ripas_value;
>>> + unsigned long top_ipa;
>>> + int ret;
>>> +
>>
>> Some checks are needed here to ensure the addresses (@base and @top)
>> falls inside
>> the protected (private) space for two facts: (1) Those parameters
>> originates from
>> the guest, which can be misbehaving. (2) RMM::smc_rtt_set_ripas() isn't
>> limited to
>> the private space, meaning it also can change RIPAS for the ranges in
>> the shared
>> space.
>
> I might be missing something, but AFAICT this is safe:
>
> 1. The RMM doesn't permit RIPAS changes within the shared space:
> RSI_IPA_STATE_SET has a precondition [rgn_bound]:
> AddrRangeIsProtected(base, top, realm)
> So a malicious guest shouldn't get passed the RMM.
>
> 2. The RMM validates that the range passed here is (a subset of) the
> one provided to the NS-world [base_bound / top_bound].
>
> And even if somehow a malicious guest managed to bypass these checks I
> don't see how it would cause harm to the host operating on the wrong region.
>
> I'm happy to be corrected though! What am I missing?
>
No, you don't miss anything, I did. I missed that the requested address range
is ensured to be part of the private space by RMM::handle_rsi_ipa_state_set().
So everything should be fine.
void handle_rsi_ipa_state_set(struct rec *rec,
struct rmi_rec_exit *rec_exit,
struct rsi_result *res)
{
:
if ((ripas_val > RIPAS_RAM) ||
!GRANULE_ALIGNED(base) || !GRANULE_ALIGNED(top) ||
(top <= base) || /* Size is zero, or range overflows */
!region_in_rec_par(rec, base, top)) {
res->action = UPDATE_REC_RETURN_TO_REALM;
res->smc_res.x[0] = RSI_ERROR_INPUT;
return;
}
:
}
> Thank,
> Steve
>
>>> + do {
>>> + kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache,
>>> + kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu));
>>> + write_lock(&kvm->mmu_lock);
>>> + ret = realm_set_ipa_state(vcpu, base, top, ripas, &top_ipa);
>>> + write_unlock(&kvm->mmu_lock);
>>> +
>>> + if (WARN_RATELIMIT(ret && ret != -ENOMEM,
>>> + "Unable to satisfy RIPAS_CHANGE for %#lx - %#lx,
>>> ripas: %#lx\n",
>>> + base, top, ripas))
>>> + break;
>>> +
>>> + base = top_ipa;
>>> + } while (top_ipa < top);
>>> +}
>>> +
>>> int kvm_rec_enter(struct kvm_vcpu *vcpu)
>>> {
>>> struct realm_rec *rec = &vcpu->arch.rec;
>>> @@ -873,6 +957,9 @@ int kvm_rec_enter(struct kvm_vcpu *vcpu)
>>> for (int i = 0; i < REC_RUN_GPRS; i++)
>>> rec->run->enter.gprs[i] = vcpu_get_reg(vcpu, i);
>>> break;
>>> + case RMI_EXIT_RIPAS_CHANGE:
>>> + kvm_complete_ripas_change(vcpu);
>>> + break;
>>> }
>>> if (kvm_realm_state(vcpu->kvm) != REALM_STATE_ACTIVE)
>>
Thanks,
Gavin
More information about the linux-arm-kernel
mailing list