[PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering

Yao Yuan yaoyuan at linux.alibaba.com
Mon Jul 21 19:01:27 PDT 2025


On Mon, Jul 21, 2025 at 07:49:10AM +0800, Keir Fraser wrote:
> On Sun, Jul 20, 2025 at 08:08:30AM +0800, Yao Yuan wrote:
> > On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote:
> > > On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote:
> > > > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> > > > > On Thu, Jul 17, 2025, Yao Yuan wrote:
> > > > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > > > > remove the distributor's dependency on this implicit barrier by
> > > > > > > direct acquire-release synchronization on the flag write and its
> > > > > > > lock-free check.
> > > > > > >
> > > > > > > Signed-off-by: Keir Fraser <keirf at google.com>
> > > > > > > ---
> > > > > > >  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > > > > >  1 file changed, 2 insertions(+), 9 deletions(-)
> > > > > > >
> > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > index 502b65049703..bc83672e461b 100644
> > > > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > > >  	gpa_t dist_base;
> > > > > > >  	int ret = 0;
> > > > > > >
> > > > > > > -	if (likely(dist->ready))
> > > > > > > +	if (likely(smp_load_acquire(&dist->ready)))
> > > > > > >  		return 0;
> > > > > > >
> > > > > > >  	mutex_lock(&kvm->slots_lock);
> > > > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > > >  		goto out_slots;
> > > > > > >  	}
> > > > > > >
> > > > > > > -	/*
> > > > > > > -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > > > > -	 * registration before returning through synchronize_srcu(), which also
> > > > > > > -	 * implies a full memory barrier. As such, marking the distributor as
> > > > > > > -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > > > > -	 * a completely configured distributor.
> > > > > > > -	 */
> > > > > > > -	dist->ready = true;
> > > > > > > +	smp_store_release(&dist->ready, true);
> > > > > >
> > > > > > No need the store-release and load-acquire for replacing
> > > > > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > > > >
> > > > > This isn't about using call_srcu(), because it's not actually about kvm->buses.
> > > > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> > > > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> > > > > with a half-baked distributor.
> > > > >
> > > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> > > > > kvm_io_bus_register_dev() to provide the ordering guarantees.  Switching to
> > > > > smp_store_release() + smp_load_acquire() removes the dependency on the
> > > > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
> > > >
> > > > Yes, I understand this and agree with your point.
> > > >
> > > > Just for discusstion: I thought it should also work even w/o
> > > > introduce the load acqure + store release after switch to
> > > > call_srcu(): The smp_mb() in call_srcu() order the all store
> > > > to kvm->arch.vgic before store kvm->arch.vgic.ready in
> > > > current implementation.
> > >
> > > The load-acquire would still be required, to ensure that accesses to
> > > kvm->arch.vgic do not get reordered earlier than the lock-free check
> > > of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is
> > > initialised, but then use speculated reads of uninitialised vgic state.
> > >
> >
> > Thanks for your explanation.
> >
> > I see. But there's "mutex_lock(&kvm->slot_lock);" before later
> > acccessing to the kvm->arch.vgic, so I think the order can be
> > guaranteed. Of cause as you said a explicitly acquire-load +
> > store-release is better than before implicitly implementation.
>
> If vgic_dist::ready is observed true by the lock-free read (the one
> which is turned into load-acquire by this patch) then the function
> immediately returns with no mutex_lock() executed. It is reads of
> vgic_dist *after* return from kvm_vgic_map_resources() that you have
> to worry about, and which require load-acquire semantics.

I think this is the main purpose of such lock-free reading
here, to avoid lock contention on VM w/ large vCPUs for
vcpus' first time run together.

store-release makes sure the changes to vgic_dist::ready
become visible after the changes to vgic_dist become
visible, but it doesn't guarantee the vgic_dist::ready
becomes visible to reader on aother CPU **IMMEDIATELY**,
thus load-acquire in reader side is request for this.
Is above understanding correct ?

>
> >
> > > > >



More information about the linux-arm-kernel mailing list