[RFC PATCH 00/17] KVM: arm64: Parallelize stage 2 fault handling

Thu Apr 21 09:43:12 PDT 2022

On Fri, Apr 15, 2022 at 5:04 PM Oliver Upton <oupton at google.com> wrote:
>
> On Fri, Apr 15, 2022 at 04:35:24PM -0700, David Matlack wrote:
> > On Fri, Apr 15, 2022 at 2:59 PM Oliver Upton <oupton at google.com> wrote:
> > >
> > > Presently KVM only takes a read lock for stage 2 faults if it believes
> > > the fault can be fixed by relaxing permissions on a PTE (write unprotect
> > > for dirty logging). Otherwise, stage 2 faults grab the write lock, which
> > > predictably can pile up all the vCPUs in a sufficiently large VM.
> > >
> > > The x86 port of KVM has what it calls the TDP MMU. Basically, it is an
> > > MMU protected by the combination of a read-write lock and RCU, allowing
> > > page walkers to traverse in parallel.
> > >
> > > This series is strongly inspired by the mechanics of the TDP MMU,
> > > making use of RCU to protect parallel walks. Note that the TLB
> > > invalidation mechanics are a bit different between x86 and ARM, so we
> > > need to use the 'break-before-make' sequence to split/collapse a
> > > block/table mapping, respectively.
> >
> > An alternative (or perhaps "v2" [1]) is to make x86's TDP MMU
> > arch-neutral and port it to support ARM's stage-2 MMU. This is based
> > on a few observations:
> >
> > - The problems that motivated the development of the TDP MMU are not
> > x86-specific (e.g. parallelizing faults during the post-copy phase of
> > Live Migration).
> > - The synchronization in the TDP MMU (read/write lock, RCU for PT
> > freeing, atomic compare-exchanges for modifying PTEs) is complex, but
> > would be equivalent across architectures.
> > - Eventually RISC-V is going to want similar performance (my
> > understanding is RISC-V MMU is already a copy-paste of the ARM MMU),
> > and it'd be a shame to re-implement TDP MMU synchronization a third
> > time.
> > - The TDP MMU includes support for various performance features that
> > would benefit other architectures, such as eager page splitting,
> > deferred zapping, lockless write-protection resolution, and (coming
> > soon) in-place huge page promotion.
> > - And then there's the obvious wins from less code duplication in KVM
> > (e.g. get rid of the RISC-V MMU copy, increased code test coverage,
> > ...).
>
> I definitely agree with the observation -- we're all trying to solve the
> same set of issues. And I completely agree that a good long term goal
> would be to create some common parts for all architectures. Less work
> for us ARM folks it would seem ;-)
>
> What's top of mind is how we paper over the architectural differences
> between all of the architectures, especially when we need to do entirely
> different things because of the arch.
>
> For example, I whine about break-before-make a lot throughout this
> series which is somewhat unique to ARM. I don't think we can do eager
> page splitting on the base architecture w/o doing the TLBI for every
> block. Not only that, we can't do a direct valid->valid change without
> first making an invalid PTE visible to hardware. Things get even more
> exciting when hardware revisions relax break-before-make requirements.

Gotcha, so porting the TDP MMU to ARM would require adding
break-before-make support. That seems feasible and we could guard it
behind a e.g. static_key so there is no runtime overhead for
architectures (or ARM hardware revisions) that do not require it.
Anything else come to mind as major architectural differences?

 >
> There's also significant architectural differences between KVM on x86
> and KVM for ARM. Our paging code runs both in the host kernel and the
> hyp/lowvisor, and does:
>
>  - VM two dimensional paging (stage 2 MMU)
>  - Hyp's own MMU (stage 1 MMU)
>  - Host kernel isolation (stage 2 MMU)
>
> each with its own quirks. The 'not exactly in the kernel' part will make
> instrumentation a bit of a hassle too.

Ah, interesting. It'd probably make sense to start with the VM
2-dimensional paging use-case and leave the other use-cases using the
existing MMU, and then investigate transitioning the other use-cases.
Similarly in x86 we still have the legacy MMU for shadow paging (e.g.
hosts with no stage-2 hardware, and nested virtualization).

>
> None of this is meant to disagree with you in the slightest. I firmly
> agree we need to share as many parts between the architectures as
> possible. I'm just trying to call out a few of the things relating to
> ARM that will make this annoying so that way whoever embarks on the
> adventure will see it.
>
> > The side of this I haven't really looked into yet is ARM's stage-2
> > MMU, and how amenable it would be to being managed by the TDP MMU. But
> > I assume it's a conventional page table structure mapping GPAs to
> > HPAs, which is the most important overlap.
> >
> > That all being said, an arch-neutral TDP MMU would be a larger, more
> > complex code change than something like this series (hence my "v2"
> > caveat above). But I wanted to get this idea out there since the
> > rubber is starting to hit the road on improving ARM MMU scalability.
>
> All for it. I cc'ed you on the series for this exact reason, I wanted to
> grab your attention to spark the conversation :)
>
> --
> Thanks,
> Oliver