[PATCH] KVM: arm64: Don't eagerly teardown the vgic on init error

Wed Oct 9 16:27:46 PDT 2024

On Wed, Oct 09, 2024 at 12:36:32PM -0700, Sean Christopherson wrote:
> On Wed, Oct 09, 2024, Oliver Upton wrote:
> > On Wed, Oct 09, 2024 at 07:36:03PM +0100, Marc Zyngier wrote:
> > > As there is very little ordering in the KVM API, userspace can
> > > instanciate a half-baked GIC (missing its memory map, for example)
> > > at almost any time.
> > > 
> > > This means that, with the right timing, a thread running vcpu-0
> > > can enter the kernel without a GIC configured and get a GIC created
> > > behind its back by another thread. Amusingly, it will pick up
> > > that GIC and start messing with the data structures without the
> > > GIC having been fully initialised.
> > 
> > Huh, I'm definitely missing something. Could you remind me where we open
> > up this race between KVM_RUN && kvm_vgic_create()?

Ah, duh, I see it now. kvm_arch_vcpu_run_pid_change() doesn't serialize
on a VM lock, and kvm_vgic_map_resources() has an early return for
vgic_ready() letting it blow straight past the config_lock.

Then if we can't register the MMIO region for the distributor
everything comes crashing down and a vCPU has made it into the KVM_RUN
loop w/ the VGIC-shaped rug pulled out from under it. There's definitely
another functional bug here where a vCPU's attempts to poke the
distributor wind up reaching userspace as MMIO exits. But we can worry
about that another day.

If memory serves, kvm_vgic_map_resources() used to do all of this behind
the config_lock to cure the race, but that wound up inverting lock
ordering on srcu.

Note to self: Impose strict ordering on GIC initialization v. vCPU
creation if/when we get a new flavor of irqchip.

> > I'd thought the fact that the latter takes all the vCPU mutexes and
> > checks if any vCPU in the VM has run would be enough to guard against
> > such a race, but clearly not...
> 
> Any chance that fixing bugs where vCPU0 can be accessed (and run!) before its
> fully online help?

That's an equally gross bug, but kvm_vgic_create() should still be safe
w.r.t. vCPU creation since both hold the kvm->lock in the right spot.
That is, since kvm_vgic_create() is called under the lock any vCPUs
visible to userspace should exist in the vCPU xarray.

The crappy assumption here is kvm_arch_vcpu_run_pid_change() and its
callees are allowed to destroy VM-scoped structures in error handling.

> E.g. if that closes the vCPU0 hole, maybe the vCPU1 case can
> be handled a bit more gracefully?

I think this is about as graceful as we can be. The sorts of screw-ups
that precipitate this error handling may involve stupidity across
several KVM ioctls, meaning it is highly unlikely to be attributable /
recoverable.

-- 
Thanks,
Oliver