[PATCH 1/1] KVM: arm64: nv: Avoid full shadow s2 unmap
Wei-Lin Chang
weilin.chang at arm.com
Wed Apr 29 09:31:57 PDT 2026
On Fri, Apr 24, 2026 at 08:45:24PM +0100, Wei-Lin Chang wrote:
> On Thu, Apr 16, 2026 at 11:50:38AM +0100, Marc Zyngier wrote:
> > On Thu, 16 Apr 2026 00:05:40 +0100,
> > Wei-Lin Chang <weilin.chang at arm.com> wrote:
> > >
> > > On Wed, Apr 15, 2026 at 09:38:55AM +0100, Marc Zyngier wrote:
> >
> > [...]
> >
> > > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > > > index 851f6171751c..a97bd461c1e1 100644
> > > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > > @@ -217,6 +217,10 @@ struct kvm_s2_mmu {
> > > > > */
> > > > > bool nested_stage2_enabled;
> > > > >
> > > > > + /* canonical IPA to nested IPA range lookup */
> > > > > + struct maple_tree nested_revmap_mt;
> > > > > + bool nested_revmap_broken;
> > > > > +
> > > >
> > > > Consider moving this boolean next to the other ones so that you don't
> > > > create too many holes in the kvm_s2_mmu structure (use pahole to find out).
> > > >
> > > > But I have some misgivings about the way things are structured
> > > > here. Only NV needs a revmap, yet this is present irrelevant of the
> > > > nature of the VM and bloats the data structure a bit.
> > > >
> > > > My naive approach would have been to only keep a pointer to the
> > > > revmap, and make that pointer NULL when the tree is "broken", and
> > > > freed under RCU if the context isn't the correct one.
> > >
> > > Can you explain what you mean by "if the context isn't the correct one"?
> > > If this refers to when selecting a specific kvm_s2_mmu instance for
> > > another context, then IIUC refcnt would already be 0 and there would be
> > > no other user of the tree.
> >
> > Sorry, "context" is an overloaded word. I meant a situation in which
> > you couldn't immediately free the maple-tree because you're holding
> > locks and freeing (hypothetically) requires a sleeping "context". in
> > this case, freeing under RCU, purely as a deferring mechanism, might
> > be useful.
>
> I experimented using RCU to free the tree as a deferring mechanism.
> Here are a few observations:
>
> - At reverse map record time, if maple tree store fails, we have to
> change the maple tree pointer to a NULL, which is an RCU write
> operation. Therefore we need to either take another lock, or use a
> xchg(ptr, NULL) to avoid the lock.
>
> - Because we're holding the read-side mmu_lock, we shouldn't block
> during reverse map record. Therefore we should use call_rcu()
> instead of synchronize_rcu() to free the "broken" tree. This implies
> a pointer to a maple tree in kvm_s2_mmu will not suffice, an
> additional structure with both the maple tree and an rcu_head have
> to be created.
>
> IMO looking at RCU calls mixed with mtree_{, un}lock(), and having a new
> wrapper struct to make this dynamic allocation scheme work is not very
> attractive to me.
>
> Instead, what do you think if I aggregate all strictly NV-related
> fields in kvm_s2_mmu i.e. tlb_vttbr, tlb_vtcr, nested_stage2_enabled,
> shadow_pt_debugfs_dentry, pending_unmap, into a struct maybe called
> kvm_s2_mmu_nested, add a maple tree in it, and have a pointer to this
> struct in kvm_s2_mmu? kvm_s2_mmu_nested can then be allocated only if we
> init a nested s2 mmu.
>
> Do you think this can work and is better than the current approaches?
After a discussion with Marc, we think making the maple tree a pointer
generates too much complexity from the reasons given above. For the
kvm_s2_mmu_nested idea, it creates churn and the same RCU problem
persists if we want to dynamically allocate and free the structure. On
the other hand, if we allocated kvm_s2_mmu_nested at init-time and reuse
the struct, then it's not much difference than just placing the maple
tree in kvm_s2_mmu, the only benefit will be keeping kvm_s2_mmu smaller
for non nested MMUs.
I'll stick with just adding the maple tree instance in kvm_s2_mmu for
the next version.
Thanks,
Wei-Lin Chang
[...]
More information about the linux-arm-kernel
mailing list