[RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
wangyanan (Y)
wangyanan55 at huawei.com
Tue Dec 1 15:08:42 EST 2020
On 2020/12/2 2:15, Will Deacon wrote:
> On Wed, Dec 02, 2020 at 01:19:35AM +0800, wangyanan (Y) wrote:
>> On 2020/12/1 22:16, Will Deacon wrote:
>>> On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
>>>> On 2020/11/30 21:21, Will Deacon wrote:
>>>>> On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
>>>>>> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>>>>> /* There's an existing valid leaf entry, so perform break-before-make */
>>>>>> kvm_set_invalid_pte(ptep);
>>>>>> kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>>>>>> + put_page(virt_to_page(ptep));
>>>>>> kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>>>>>> out:
>>>>>> data->phys += granule;
>>>>> Isn't this hunk alone sufficient to solve the problem?
>>>>>
>>>> Not sufficient enough. When the old ptep is valid and old pte equlas new
>>>> pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
>>>>
>>>> and get_page() will still be called.
>>> I had a go at fixing this without introducing refcounting to the hyp stage-1
>>> case, and ended up with the diff below. What do you think?
>> Functionally this diff looks fine to me. A small comment inline, please see
>> below.
>>
>> I had made an alternative fix (after sending v1) and it looks much more
>> concise.
>>
>> If you're ok with it, I can send it as v2 (together with patch#2 and #3)
>> after some tests.
> Thanks.
>
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index 0271b4a3b9fe..b232bdd142a6 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -470,6 +470,9 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64
>> end, u32 level,
>> if (!kvm_block_mapping_supported(addr, end, phys, level))
>> return false;
>>
>> + if (kvm_pte_valid(*ptep))
>> + put_page(virt_to_page(ptep));
>> +
>> if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
>> goto out;
> This is certainly smaller, but see below.
>
>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>> index 0271b4a3b9fe..78e2c0dc47ae 100644
>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>> @@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
>>> smp_store_release(ptep, pte);
>>> }
>>> -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
>>> - u32 level)
>>> +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
>>> {
>>> - kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
>>> + kvm_pte_t pte = kvm_phys_to_pte(pa);
>>> u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
>>> KVM_PTE_TYPE_BLOCK;
>>> pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
>>> pte |= FIELD_PREP(KVM_PTE_TYPE, type);
>>> pte |= KVM_PTE_VALID;
>>> -
>>> - /* Tolerate KVM recreating the exact same mapping. */
>>> - if (kvm_pte_valid(old))
>>> - return old == pte;
>>> -
>>> - smp_store_release(ptep, pte);
>>> - return true;
>>> + return pte;
>>> }
>>> static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
>>> @@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
>>> static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>> kvm_pte_t *ptep, struct hyp_map_data *data)
>>> {
>>> + kvm_pte_t new, old = *ptep;
>>> u64 granule = kvm_granule_size(level), phys = data->phys;
>>> if (!kvm_block_mapping_supported(addr, end, phys, level))
>>> return false;
>>> - WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
>>> + /* Tolerate KVM recreating the exact same mapping. */
>>> + new = kvm_init_valid_leaf_pte(phys, data->attr, level);
>>> + if (old != new && !WARN_ON(kvm_pte_valid(old)))
>>> + smp_store_release(ptep, new);
>>> +
>>> data->phys += granule;
>>> return true;
>>> }
>>> @@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>> kvm_pte_t *ptep,
>>> struct stage2_map_data *data)
>>> {
>>> + kvm_pte_t new, old = *ptep;
>>> u64 granule = kvm_granule_size(level), phys = data->phys;
>>> if (!kvm_block_mapping_supported(addr, end, phys, level))
>>> return false;
>>> - if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
>>> - goto out;
>>> + new = kvm_init_valid_leaf_pte(phys, data->attr, level);
>>> + if (kvm_pte_valid(old)) {
>>> + /*
>>> + * There's an existing valid leaf entry, so perform
>>> + * break-before-make.
>>> + */
>>> + kvm_set_invalid_pte(ptep);
>>> + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>>> + put_page(virt_to_page(ptep));
>> When old PTE is valid and equals to the new one, we will also perform
>> break-before-make and the new PTE installation. But they're actually not
>> necessary in this case.
> Agreed, but I don't think that case should happen for stage-2 mappings.
> That's why I prefer my diff here, as it makes that 'old == new' logic
> specific to the hyp mappings.
>
> But I'll leave it all up to you (these diffs are only intended to be
> helpful).
>
> Will
> .
Hi Will,
In my opinion, the 'old == new' case might happen too in stage-2
translation, especially in the condition of concurrent access of
multiple vCPUs.
For example, when merging tables into a block, we will perform
break-before-make for a valid old PTE in function stage2_map_walk_pre().
If the other vCPUs are trying to access the same GPA range, so the MMUs
will target at the same PTE as above, and they might just catch the moment
when the target PTE is set invalid in BBM by the vCPU holding the
mmu_lock, but the old PTE will be updated to valid later.
As a result, translation fault will be triggered in these vCPUs, then
they will wait for the mmu_lock to map exactly the *same* mapping (old
== new).
Thanks,
Yanan
More information about the linux-arm-kernel
mailing list