[PATCH 1/5] KVM: arm64: Walk userspace page tables to compute the THP mapping size
Sean Christopherson
seanjc at google.com
Wed Jul 21 08:56:09 PDT 2021
On Wed, Jul 21, 2021, Will Deacon wrote:
> > For the page tables liveliness, KVM implements mmu_notifier_ops.release, which is
> > invoked at the beginning of exit_mmap(), before the page tables are freed. In
> > its implementation, KVM takes mmu_lock and zaps all its shadow page tables, a.k.a.
> > the stage2 tables in KVM arm64. The flow in question, get_user_mapping_size(),
> > also runs under mmu_lock, and so effectively blocks exit_mmap() and thus is
> > guaranteed to run with live userspace tables.
>
> Unless I missed a case, exit_mmap() only runs when mm_struct::mm_users drops
> to zero, right?
Yep.
> The vCPU tasks should hold references to that afaict, so I don't think it
> should be possible for exit_mmap() to run while there are vCPUs running with
> the corresponding page-table.
Ah, right, I was thinking of non-KVM code that operated on the page tables without
holding a reference to mm_users.
> > Looking at the arm64 code, one thing I'm not clear on is whether arm64 correctly
> > handles the case where exit_mmap() wins the race. The invalidate_range hooks will
> > still be called, so userspace page tables aren't a problem, but
> > kvm_arch_flush_shadow_all() -> kvm_free_stage2_pgd() nullifies mmu->pgt without
> > any additional notifications that I see. x86 deals with this by ensuring its
> > top-level TDP entry (stage2 equivalent) is valid while the page fault handler is
> > running.
>
> But the fact that x86 handles this race has me worried. What am I missing?
I don't think you're missing anything. I forgot that KVM_RUN would require an
elevated mm_users. x86 does handle the impossible race, but that's coincidental.
The extra protections in x86 are to deal with other cases where a vCPU's top-level
SPTE can be invalidated while the vCPU is running.
More information about the linux-arm-kernel
mailing list