[PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() to generic page-table API

Will Deacon will at kernel.org
Thu Sep 3 13:57:03 EDT 2020


On Wed, Sep 02, 2020 at 05:23:08PM +0100, Alexandru Elisei wrote:
> On 8/25/20 10:39 AM, Will Deacon wrote:
> > Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead
> > of walking the page-table directly.
> >
> > Cc: Marc Zyngier <maz at kernel.org>
> > Cc: Quentin Perret <qperret at google.com>
> > Signed-off-by: Will Deacon <will at kernel.org>
> > ---
> >  arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++++++-------------------
> >  1 file changed, 32 insertions(+), 25 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 704b471a48ce..751ce2462765 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -39,6 +39,33 @@ static bool is_iomap(unsigned long flags)
> >  	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
> >  }
> >  
> > +/*
> > + * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
> > + * we may see kernel panics with CONFIG_DETECT_HUNG_TASK,
> > + * CONFIG_LOCKUP_DETECTOR, CONFIG_LOCKDEP. Additionally, holding the lock too
> > + * long will also starve other vCPUs. We have to also make sure that the page
> > + * tables are not freed while we released the lock.
> > + */
> > +#define stage2_apply_range(kvm, addr, end, fn, resched)			\
> > +({									\
> > +	int ret;							\
> > +	struct kvm *__kvm = (kvm);					\
> > +	bool __resched = (resched);					\
> > +	u64 next, __addr = (addr), __end = (end);			\
> > +	do {								\
> > +		struct kvm_pgtable *pgt = __kvm->arch.mmu.pgt;		\
> > +		if (!pgt)						\
> > +			break;						\
> 
> I'm 100% sure there's a reason why we've dropped the READ_ONCE, but it still looks
> to me like the compiler might decide to optimize by reading pgt once at the start
> of the loop and stashing it in a register. Would you mind explaining what I am
> missing?

The load always happens with the mmu_lock held, so I think it's not a
problem because it means that the pointer is stable.
spin_lock()/spin_unlock() imply compiler barriers.

> > +		next = stage2_pgd_addr_end(__kvm, __addr, __end);	\
> > +		ret = fn(pgt, __addr, next - __addr);			\
> > +		if (ret)						\
> > +			break;						\
> > +		if (__resched && next != __end)				\
> > +			cond_resched_lock(&__kvm->mmu_lock);		\
> > +	} while (__addr = next, __addr != __end);			\
> > +	ret;								\
> > +})
> 
> This seems unusual to me. We have a non-trivial, multiline macro which calls
> cond_resched(), has 6 local variables, and is called from exactly one place.I am
> curious why we are not open coding the loop in __unmap_stage2_range() or using a
> function.

It's called from three places. That said, I think it's like this because in
an earlier life it was used as an iterator and therefore had to be a macro.
I can try moving it into a function instead.

Will



More information about the linux-arm-kernel mailing list