[PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() to generic page-table API

Wed Sep 9 06:57:26 EDT 2020

Hi Will,

I'm answering my own question, again. See below.

On 9/8/20 2:07 PM, Alexandru Elisei wrote:
> Hi Will,
>
> On 9/3/20 6:57 PM, Will Deacon wrote:
>> On Wed, Sep 02, 2020 at 05:23:08PM +0100, Alexandru Elisei wrote:
>>> On 8/25/20 10:39 AM, Will Deacon wrote:
>>>> Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead
>>>> of walking the page-table directly.
>>>>
>>>> Cc: Marc Zyngier <maz at kernel.org>
>>>> Cc: Quentin Perret <qperret at google.com>
>>>> Signed-off-by: Will Deacon <will at kernel.org>
>>>> ---
>>>>  arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++++++-------------------
>>>>  1 file changed, 32 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>> index 704b471a48ce..751ce2462765 100644
>>>> --- a/arch/arm64/kvm/mmu.c
>>>> +++ b/arch/arm64/kvm/mmu.c
>>>> @@ -39,6 +39,33 @@ static bool is_iomap(unsigned long flags)
>>>>  	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
>>>>  }
>>>>  
>>>> +/*
>>>> + * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
>>>> + * we may see kernel panics with CONFIG_DETECT_HUNG_TASK,
>>>> + * CONFIG_LOCKUP_DETECTOR, CONFIG_LOCKDEP. Additionally, holding the lock too
>>>> + * long will also starve other vCPUs. We have to also make sure that the page
>>>> + * tables are not freed while we released the lock.
>>>> + */
>>>> +#define stage2_apply_range(kvm, addr, end, fn, resched)			\
>>>> +({									\
>>>> +	int ret;							\
>>>> +	struct kvm *__kvm = (kvm);					\
>>>> +	bool __resched = (resched);					\
>>>> +	u64 next, __addr = (addr), __end = (end);			\
>>>> +	do {								\
>>>> +		struct kvm_pgtable *pgt = __kvm->arch.mmu.pgt;		\
>>>> +		if (!pgt)						\
>>>> +			break;						\
>>> I'm 100% sure there's a reason why we've dropped the READ_ONCE, but it still looks
>>> to me like the compiler might decide to optimize by reading pgt once at the start
>>> of the loop and stashing it in a register. Would you mind explaining what I am
>>> missing?
>> The load always happens with the mmu_lock held, so I think it's not a
>> problem because it means that the pointer is stable.
>> spin_lock()/spin_unlock() imply compiler barriers.
> I think you are correct, if this is supposed to always execute with kvm->mmu_lock
> held, then pgt should not change between iterations. It didn't immediately occur
> to me that that is the case because we check if pgt is NULL every iteration. If we
> are relying on the lock being held, maybe we should move the pgt load + comparison
> against NULL out of the loop? That should avoid any confusion and make the code
> ever so slightly faster.
>
> Also, I see that in __unmap_stage2_range() we check that the mmu_lock is held, but
> we don't check that at all call sites (for example, in stage2_wp_range()). I
> realize this is me bikeshedding, but that looks a bit asymmetrical. Should we move
> the assert_spin_locked(&kvm->mmu_lock) statement in stage2_apply_range(), since
> the function assumes the pgt will remain unchanged? What do you think?

What I wrote is wrong, because we can drop the lock in cond_resched_lock(). I
don't see the need for any changes.

Thanks,
Alex