[PATCH v4 4/4] KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev()

Sean Christopherson seanjc at google.com
Fri Feb 13 15:20:48 PST 2026


On Fri, Feb 13, 2026, Nikita Kalyazin wrote:
> 
> 
> On 09/09/2025 11:00, Keir Fraser wrote:
> > Device MMIO registration may happen quite frequently during VM boot,
> > and the SRCU synchronization each time has a measurable effect
> > on VM startup time. In our experiments it can account for around 25%
> > of a VM's startup time.
> > 
> > Replace the synchronization with a deferred free of the old kvm_io_bus
> > structure.
> 
> 
> Hi,
> 
> We noticed that this change introduced a regression of ~20 ms to the first
> KVM_CREATE_VCPU call of a VM, which is significant for our use case.
> 
> Before the patch:
> 45726 14:45:32.914330 ioctl(25, KVM_CREATE_VCPU, 0) = 28 <0.000137>
> 45726 14:45:32.914533 ioctl(25, KVM_CREATE_VCPU, 1) = 30 <0.000046>
> 
> After the patch:
> 30295 14:47:08.057412 ioctl(25, KVM_CREATE_VCPU, 0) = 28 <0.025182>
> 30295 14:47:08.082663 ioctl(25, KVM_CREATE_VCPU, 1) = 30 <0.000031>
> 
> The reason, as I understand, it happens is call_srcu() called from
> kvm_io_bus_register_dev() are adding callbacks to be called after a normal
> GP, which is 10 ms with HZ=100.  The subsequent synchronize_srcu_expedited()
> called from kvm_swap_active_memslots() (from KVM_CREATE_VCPU) has to wait
> for the normal GP to complete before making progress.  I don't fully
> understand why the delay is consistently greater than 1 GP, but that's what
> we see across our testing scenarios.
> 
> I verified that the problem is relaxed if the GP is reduced by configuring
> HZ=1000.  In that case, the regression is in the order of 1 ms.
> 
> It looks like in our case we don't benefit much from the intended
> optimisation as the number of device MMIO registrations is limited and and
> they don't cost us much (each takes at most 16 us, but most commonly ~6 us):

Maybe differences in platforms for arm64 vs x86?

> I am not aware of way to make it fast for both use cases and would be more
> than happy to hear about possible solutions.

What if we key off of vCPUS being created?  The motivation for Keir's change was
to avoid stalling during VM boot, i.e. *after* initial VM creation.

--
From: Sean Christopherson <seanjc at google.com>
Date: Fri, 13 Feb 2026 15:15:01 -0800
Subject: [PATCH] KVM: Synchronize SRCU on I/O device registration if vCPUs
 haven't been created

TODO: Write a changelog if this works.

Fixes: 7d9a0273c459 ("KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev()")
Reported-by: Nikita Kalyazin <kalyazin at amazon.com>
Closes: https://lkml.kernel.org/r/a84ddba8-12da-489a-9dd1-ccdf7451a1ba%40amazon.com
Cc: stable at vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc at google.com>
---
 virt/kvm/kvm_main.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 571cf0d6ec01..043b1c3574ab 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6027,7 +6027,30 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
 	memcpy(new_bus->range + i + 1, bus->range + i,
 		(bus->dev_count - i) * sizeof(struct kvm_io_range));
 	rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
-	call_srcu(&kvm->srcu, &bus->rcu, __free_bus);
+
+	/*
+	 * To optimize VM creation *and* boot time, use different tactics for
+	 * safely freeing the old bus based on where the VM is at in its
+	 * lifecycle.  If vCPUs haven't yet been created, simply synchronize
+	 * and free, as there are unlikely to be active SRCU readers; if not,
+	 * defer freeing the bus via SRCU callback.
+	 *
+	 * If there are active SRCU readers, synchronizing will stall until the
+	 * current grace period completes, which can meaningfully impact boot
+	 * time for VMs that trigger a large number of registrations.
+	 *
+	 * If there aren't SRCU readers, using an SRCU callback can be a net
+	 * negative due to starting a grace period of its own, which in turn
+	 * can unnecessarily cause a future synchronization to stall.  E.g. if
+	 * devices are registered before memslots are created, then creating
+	 * the first memslot will have to wait for a superfluous grace period.
+	 */
+	if (!READ_ONCE(kvm->created_vcpus)) {
+		synchronize_srcu_expedited(&kvm->srcu);
+		kfree(bus);
+	} else {
+		call_srcu(&kvm->srcu, &bus->rcu, __free_bus);
+	}
 
 	return 0;
 }

base-commit: 183bb0ce8c77b0fd1fb25874112bc8751a461e49
--



More information about the linux-arm-kernel mailing list