[RFC PATCH 1/2] KVM: arm64: Introduce S2 walker SKIP return options

Oliver Upton oupton at kernel.org
Tue May 19 14:21:41 PDT 2026


On Tue, May 19, 2026 at 03:35:19PM +0100, Leonardo Bras wrote:
> On Tue, May 19, 2026 at 02:15:41PM +0100, Will Deacon wrote:
> > On Tue, May 19, 2026 at 01:56:48PM +0100, Leonardo Bras wrote:
> > > On Tue, May 19, 2026 at 01:43:37PM +0100, Will Deacon wrote:
> > > > > > I was wondering along similar lines, but maybe it would be useful just
> > > > > > to pass a maximum level to the walker logic? That feels like the most
> > > > > > general case without complicating the existing logic.

FWIW, I had considered this too but decided that it requires a bit more
churn since we cannot rely on zero initialization in the existing
callsites (level 0 is a valid level).

But that's extremely minor.

> > > > > This proposal seems simpler for me to understand, and indeed looks like a 
> > > > > better solution than what I have proposed, taking care of  the 
> > > > > 'already split' case with better performance, as it don't even walk a 
> > > > > single level-3 entry. 
> > > > > 
> > > > > On the 'splitting' case, it also works flawlessly if the memory is given in 
> > > > > level-2 blocks. There is only one case that I would like to address here:
> > > > > 
> > > > > - Memory given in level-1 blocks (say 1GB)
> > > > > - Walker flag says 'walk down to level-2 only'
> > > > > - Split Walker on level-1 will break page down to (up to) level-3 entries.
> > > > > - Walker will continue to be called on level-2 entries, even though it's 
> > > > >   not necessary.
> > > > 
> > > > If you're only visiting leaves, why would it be called on the level-2
> > > > table entries?
> > > > 
> > > 
> > > Because once the leaf is turned into a table by the splitting walker, it 
> > > gets reloaded and walked. This is an excerpt of __kvm_pgtable_visit():
> > 
> > Sorry, I was musing about the semantics after adding something to limit
> > the maximum level. I don't dispute what the current code would do.
> > 
> > > Example:
> > > - Split this level-1 leave:
> > >   - Walker creates the whole structure up to given level (currently 3)
> > >   - Walker returns, gets reloaded, table detected, go down on that one
> > >   - Level 2 entries walked (which is unnecessary)
> > > 
> > > Please let me know if I am misunderstanding something.
> > 
> > I just don't grok why this would happen if we limited the maximum level
> > to '2' _and_ said we only wanted to visit the leaf entries. In that
> > case, I wouldn't expect to descend into any of the L2 table entries
> > (because that would imply going beyond level 2) and I wouldn't expect to
> > be called for the table entries either (because we're only interested in
> > leaves).
> 
> Agree, if we specify to skip level-3 entries, it would only walk up to 
> level-2 entries, but take above example in detail:
> - Split these level-1 leaves, up to level-3 leaves (regular)
>   - INFO: kvm_pgtable_walk will call walker:
>     - only up to level-2 entries (skip level-3)
>     - only on leaf entries
>   - Walk first level-1 leaf, calls walker
>     - walker will split the level-1 leaf in level-3 leaves
>     - walker return from that first level-1 leaf
>   - level-1 leaf is reloaded as a table
>     - level-2 entries of that table are also walked (unnecessary)
>     - on each of the level-2 table entries, level-3 entries are skipped
> 
> To avoid the unecessary walk of the level-2 entries above, we would need to 
> specify 'skip level-2' that could be an issue if we have a mix of level-1 
> and level-2 leaves, as the level-2 leaves in that case would not be split.
> 
> That's why I suggest something like "skip recently created table" as a flag 
> as well, so we can guarantee no newly created table gets walked 
> unecessarily.
> 
> Please help me if I am missing something important.

I'm not sure the added complexity of handling this case perfectly results
in a measurable performance improvement. Just avoiding the level 3
tables would be an exponential reduction (~ 512-8192x) in the number of
walk steps.

Thanks,
Oliver



More information about the linux-arm-kernel mailing list