[PATCH] KVM: arm64: Don't eagerly teardown the vgic on init error
Marc Zyngier
maz at kernel.org
Thu Oct 10 05:47:05 PDT 2024
On Thu, 10 Oct 2024 09:47:04 +0100,
Oliver Upton <oliver.upton at linux.dev> wrote:
>
> On Thu, Oct 10, 2024 at 08:54:43AM +0100, Marc Zyngier wrote:
> > On Thu, 10 Oct 2024 00:27:46 +0100, Oliver Upton <oliver.upton at linux.dev> wrote:
> > > Then if we can't register the MMIO region for the distributor
> > > everything comes crashing down and a vCPU has made it into the KVM_RUN
> > > loop w/ the VGIC-shaped rug pulled out from under it. There's definitely
> > > another functional bug here where a vCPU's attempts to poke the
> > > distributor wind up reaching userspace as MMIO exits. But we can worry
> > > about that another day.
> >
> > I don't think that one is that bad. Userspace got us here, and they
> > now see an MMIO exit for something that it is not prepared to handle.
> > Suck it up and die (on a black size M t-shirt, please).
>
> LOL, I'll remember that.
>
> The situation I have in mind is a bit harder to blame on userspace,
> though. Supposing that the whole VM was set up correctly, multiple vCPUs
> entering KVM_RUN concurrently could cause this race and have 'unexpected'
> MMIO exits go out to userspace.
>
> vcpu-0 vcpu-1
> ====== ======
> kvm_vgic_map_resources()
> dist->ready = true
> mutex_unlock(config_lock)
> kvm_vgic_map_resources()
> if (vgic_ready())
> return 0
>
> < enter guest >
> typer = writel(0, GICD_CTLR)
>
> < data abort >
> kvm_io_bus_write(...) <= No GICD, out to userspace
>
> vgic_register_dist_iodev()
>
> A small but stupid window to race with.
Ah, gotcha. I guess getting rid of the early-out in
kvm_vgic_map_resources() would plug that one. Want to post a fix for
that?
>
> > > If memory serves, kvm_vgic_map_resources() used to do all of this behind
> > > the config_lock to cure the race, but that wound up inverting lock
> > > ordering on srcu.
> >
> > Probably something like that. We also used to hold the kvm lock, which
> > made everything much simpler, but awfully wrong.
> >
> > > Note to self: Impose strict ordering on GIC initialization v. vCPU
> > > creation if/when we get a new flavor of irqchip.
> >
> > One of the things we should have done when introducing GICv3 is to
> > impose that at KVM_DEV_ARM_VGIC_CTRL_INIT, the GIC memory map is
> > final. I remember some push-back on the QEMU side of things, as they
> > like to decouple things, but this has proved to be a nightmare.
>
> Pushing more of the initialization complexity into userspace feels like
> the right thing. Since we clearly have no idea what we're doing :)
KVM APIv2?
>
> > > The crappy assumption here is kvm_arch_vcpu_run_pid_change() and its
> > > callees are allowed to destroy VM-scoped structures in error handling.
> >
> > I think this is symptomatic of more general issue: we perform VM-wide
> > configuration in the context of a vcpu. We have tons of this stuff to
> > paper over the lack of a "this VM is fully configured" barrier.
> >
> > I wonder whether we could sidestep things by punting the finalisation
> > of the VM to a different context (workqueue?) and simply return
> > -EAGAIN or -EINTR to userspace while we're processing it. That doesn't
> > solve the "I'm missing parts of the address map and I'm going to die"
> > part though.
>
> Throwing it back at userspace would be nice, but unfortunately for ABI I
> think we need to block/spin vCPUs in the kernel til the VM is in fully
> working condition. A fragile userspace could explode for a 'spurious'
> EAGAIN/EINTR where there wasn't one before.
EINTR needs to be handled already, as this is how you report
preemption by a signal. But yeah, overall, I'm not enthralled with
much so far...
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list