[PATCH] KVM: arm/arm64: fix unaligned hva start and end in handle_hva_to_gpa

Jia He hejianet at gmail.com
Tue May 15 06:15:29 PDT 2018



On 5/15/2018 8:38 PM, Jia He Wrote:
> Hi Suzuki
> 
> On 5/15/2018 4:36 PM, Suzuki K Poulose Wrote:
>>
>> Hi Jia
>>
>> On 05/15/2018 03:03 AM, Jia He wrote:
>>> Hi Suzuki
>>>
>>> I will merge the other thread into this, and add the necessary CC list
>>>
>>> That WARN_ON call trace is very easy to reproduce in my armv8a server after I
>>> start 20 guests
>>>
>>> and run memhog in the host. Of course, ksm should be enabled
>>>
>>> For you question about my inject fault debug patch:
>>
>>
>> Thanks for the patch, comments below.
>>
>>>
>>
>> ...
>>
>>> index 7f6a944..ab8545e 100644
>>> --- a/virt/kvm/arm/mmu.c
>>> +++ b/virt/kvm/arm/mmu.c
>>> @@ -290,12 +290,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
>>>    * destroying the VM), otherwise another faulting VCPU may come in and mess
>>>    * with things behind our backs.
>>>    */
>>> +extern int trigger_by_ksm;
>>>   static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>>>   {
>>>          pgd_t *pgd;
>>>          phys_addr_t addr = start, end = start + size;
>>>          phys_addr_t next;
>>>
>>> +       if(trigger_by_ksm) {
>>> +               end -= 0x200;
>>> +       }
>>> +
>>>          assert_spin_locked(&kvm->mmu_lock);
>>>          pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>>>          do {
>>>
>>> I need to point out that I never reproduced it without this debugging patch.
>>
>> That could trigger the panic iff your "size" <= 0x200, leading to the
>> condition (end < start), which can make the loop go forever, as we
>> do while(addr < end) and end up accessing something which may not be PGD entry
>> and thus get a bad page with bad numbers all around. This case could be hit only
>> with your change and the bug in the KSM which gives us an address near the page
>> boundary.
> No, I injected the fault on purpose to simulate the case when size is less than
> PAGE_SIZE(eg. PAGE_SIZE-0x200=65024)
> I ever got the panic info [1] *without* the debugging patch only once
> 
> [1] https://lkml.org/lkml/2018/5/9/992
>>
>> So, I think we can safely ignore the PANIC().
>> More below.
>>
>>
>>>>> Suzuki, thanks for the comments.
>>>>>
>>>>> I proposed another ksm patch https://lkml.org/lkml/2018/5/3/1042
>>>>> The root cause is ksm will add some extra flags to indicate that the page
>>>>> is in/not_in the stable tree. This makes address not be aligned with PAGE_SIZE.
>>>> Thanks for the pointer. In the future, please Cc the people relevant to the
>>>> discussion in the patches.
>>>>
>>>>>   From arm kvm mmu point of view, do you think handle_hva_to_gpa still need
>>>>> to handle
>>>>> the unalignment case?
>>>> I don't think we should do that. Had we done this, we would never have caught
>>>> this bug
>>>> in KSM. Eventually if some other new implementation comes up with the a new
>>>> notifier
>>>> consumer which doesn't check alignment and doesn't WARN, it could simply do
>>>> the wrong
>>>> thing. So I believe what we have is a good measure to make sure that things are
>>>> in the right order.
>>>>
>>>>> IMO, the PAGE_SIZE alignment is still needed because we should not let the
>>>>> bottom function
>>>>> kvm_age_hva_handler to handle the exception. Please refer to the
>>>>> implementation in X86 and
>>>>> powerpc kvm_handle_hva_range(). They both aligned the hva with
>>>>> hva_to_gfn_memslot.
>>>>>
>>>>   From an API perspective, you are passed on a "start" and "end" address. So,
>>>> you could potentially
>>>> do the wrong thing if you align the "start" and "end". May be those handlers
>>>> should also do the
>>>> same thing as we do.
>>
>>> But handle_hva_to_gpa has partially adjusted the alignment possibly:
>>>     1750         kvm_for_each_memslot(memslot, slots) {
>>>     1751                 unsigned long hva_start, hva_end;
>>>     1752                 gfn_t gpa;
>>>     1753
>>>     1754                 hva_start = max(start, memslot->userspace_addr);
>>>     1755                 hva_end = min(end, memslot->userspace_addr +
>>>     1756                             (memslot->npages << PAGE_SHIFT));
>>>
>>> at line 1755, let us assume that end=0x12340200 and
>>> memslot->userspace_addr + (memslot->npages << PAGE_SHIFT)=0x12340000
>>> Then, hva_start is not page_size aligned and hva_end is aligned, and the size
>>> will be PAGE_SIZE-0x200,
>>> just as what I had done in the inject fault debugging patch.
>>
>> Thats because we want to limit the handling of the hva/gpa range by memslot. So,
>> we make sure we pass on the range within the given memslot
>> to hva_to_gfn_memslot(). But we do iterate over the next memslot if the
>> original range falls in to the next slot. So, in practice, there is no
>> alignment/trimming of the range. Its just that we pass on the appropriate range
>> for each slot.
>>
> Yes, I understand what the codes did in hva_to_gfn_memslot(). What I mean is
> hva_end may be changed and (hva_end - hva_start) will not be same as the
> parameter _size_ ?
> 
>> ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data);
> 
> Anyway, I have to admit that all the exceptions are originally caused by the
> STABLE_FLAG in ksm code. What I want to discuss here is how to make arm kvm
> handle the exception more gracefully.
> 
Hi Suzuki
How about this patch (balance of avoiding the WARN_ON storm and debugging
convenience):
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 7f6a944..4033946 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -297,6 +297,8 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t
start, u64 size)
 	phys_addr_t next;

 	assert_spin_locked(&kvm->mmu_lock);
+
+	WARN_ON_ONCE(size & ~PAGE_MASK);
 	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
 	do {
 		/*
@@ -1792,7 +1794,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t
gpa, u64 size, void *data
 {
 	pte_t *pte = (pte_t *)data;

-	WARN_ON(size != PAGE_SIZE);
+	WARN_ON_ONCE(size != PAGE_SIZE);
 	/*
 	 * We can always call stage2_set_pte with KVM_S2PTE_FLAG_LOGGING_ACTIVE
 	 * flag clear because MMU notifiers will have unmapped a huge PMD before
@@ -1823,7 +1825,7 @@ static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa,
u64 size, void *data)
 	pmd_t *pmd;
 	pte_t *pte;

-	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE);
+	WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE);
 	pmd = stage2_get_pmd(kvm, NULL, gpa);
 	if (!pmd || pmd_none(*pmd))	/* Nothing there */
 		return 0;
@@ -1843,7 +1845,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t
gpa, u64 size, void *
 	pmd_t *pmd;
 	pte_t *pte;

-	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE);
+	WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE);
 	pmd = stage2_get_pmd(kvm, NULL, gpa);
 	if (!pmd || pmd_none(*pmd))	/* Nothing there */
 		return 0;

-- 
Cheers,
Jia



More information about the linux-arm-kernel mailing list