[RFC PATCH 09/17] KVM: arm64: Tear down unlinked page tables in parallel walk

Thu Apr 21 09:40:56 PDT 2022

On Thu, Apr 21, 2022 at 01:21:54PM +0000, Quentin Perret wrote:
> Hi Oliver,
> 
> On Friday 15 Apr 2022 at 21:58:53 (+0000), Oliver Upton wrote:
> > Breaking a table pte is insufficient to guarantee ownership of an
> > unlinked subtree. Parallel software walkers could be traversing
> > substructures and changing their mappings.
> > 
> > Recurse through the unlinked subtree and lock all descendent ptes
> > to take ownership of the subtree. Since the ptes are actually being
> > evicted, return table ptes back to the table walker to ensure child
> > tables are also traversed. Note that this is done both in both the
> > pre-order and leaf visitors as the underlying pte remains volatile until
> > it is unlinked.
> 
> Still trying to get the full picture of the series so bear with me. IIUC
> the case you're dealing with here is when we're coallescing a table into
> a block with concurrent walkers making changes in the sub-tree. I
> believe this should happen when turning dirty logging off?

Yup, I think that's the only time we wind up collapsing tables.

> Why do we need to recursively lock the entire sub-tree at all in this
> case? As long as the table is turned into a locked invalid PTE, what
> concurrent walkers are doing in the sub-tree should be irrelevant no?
> None of the changes they do will be made visible to the hardware anyway.
> So as long as the sub-tree isn't freed under their feet (which should be
> the point of the RCU protection) this should be all fine? Is there a
> case where this is not actually true?

The problem arises when you're trying to actually free an unlinked
subtree. All bets are off until the next RCU grace period. What would
stop another software walker from installing a table to a PTE that I've
already visited? I think we'd wind up leaking a table page in this case
as the walker doing the table collapse assumes it has successfully freed
everything underneath.

The other option would be to not touch the subtree at all until the rcu
callback, as at that point software will not tweak the tables any more.
No need for atomics/spinning and can just do a boring traversal. Of
course, I lazily avoided this option because it would be a bit more code
but isn't too awfully complicated.

Does this paint a better picture, or have I only managed to confuse even
more? :)

--
Thanks,
Oliver