[PATCH] KVM: arm64: Don't eagerly teardown the vgic on init error

Wed Oct 9 16:30:45 PDT 2024

On Wed, Oct 09, 2024 at 11:27:52PM +0000, Oliver Upton wrote:
> On Wed, Oct 09, 2024 at 12:36:32PM -0700, Sean Christopherson wrote:
> > On Wed, Oct 09, 2024, Oliver Upton wrote:
> > > On Wed, Oct 09, 2024 at 07:36:03PM +0100, Marc Zyngier wrote:
> > > > As there is very little ordering in the KVM API, userspace can
> > > > instanciate a half-baked GIC (missing its memory map, for example)
> > > > at almost any time.
> > > > 
> > > > This means that, with the right timing, a thread running vcpu-0
> > > > can enter the kernel without a GIC configured and get a GIC created
> > > > behind its back by another thread. Amusingly, it will pick up
> > > > that GIC and start messing with the data structures without the
> > > > GIC having been fully initialised.
> > > 
> > > Huh, I'm definitely missing something. Could you remind me where we open
> > > up this race between KVM_RUN && kvm_vgic_create()?
> 
> Ah, duh, I see it now. kvm_arch_vcpu_run_pid_change() doesn't serialize
> on a VM lock, and kvm_vgic_map_resources() has an early return for
> vgic_ready() letting it blow straight past the config_lock.
> 
> Then if we can't register the MMIO region for the distributor
> everything comes crashing down and a vCPU has made it into the KVM_RUN
> loop w/ the VGIC-shaped rug pulled out from under it. There's definitely
> another functional bug here where a vCPU's attempts to poke the

a theoretical bug, that is. In practice the window to race against
likely isn't big enough to get the in-guest vCPU to the point of poking
the halfway-initialized distributor.

> distributor wind up reaching userspace as MMIO exits. But we can worry
> about that another day.
> 
> If memory serves, kvm_vgic_map_resources() used to do all of this behind
> the config_lock to cure the race, but that wound up inverting lock
> ordering on srcu.
> 
> Note to self: Impose strict ordering on GIC initialization v. vCPU
> creation if/when we get a new flavor of irqchip.
> 
> > > I'd thought the fact that the latter takes all the vCPU mutexes and
> > > checks if any vCPU in the VM has run would be enough to guard against
> > > such a race, but clearly not...
> > 
> > Any chance that fixing bugs where vCPU0 can be accessed (and run!) before its
> > fully online help?
> 
> That's an equally gross bug, but kvm_vgic_create() should still be safe
> w.r.t. vCPU creation since both hold the kvm->lock in the right spot.
> That is, since kvm_vgic_create() is called under the lock any vCPUs
> visible to userspace should exist in the vCPU xarray.
> 
> The crappy assumption here is kvm_arch_vcpu_run_pid_change() and its
> callees are allowed to destroy VM-scoped structures in error handling.
> 
> > E.g. if that closes the vCPU0 hole, maybe the vCPU1 case can
> > be handled a bit more gracefully?
> 
> I think this is about as graceful as we can be. The sorts of screw-ups
> that precipitate this error handling may involve stupidity across
> several KVM ioctls, meaning it is highly unlikely to be attributable /
> recoverable.
> 
> -- 
> Thanks,
> Oliver

-- 
Thanks,
Oliver