[PATCH v4 4/4] KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev()
Nikita Kalyazin
kalyazin at amazon.com
Fri Feb 13 07:42:16 PST 2026
On 09/09/2025 11:00, Keir Fraser wrote:
> Device MMIO registration may happen quite frequently during VM boot,
> and the SRCU synchronization each time has a measurable effect
> on VM startup time. In our experiments it can account for around 25%
> of a VM's startup time.
>
> Replace the synchronization with a deferred free of the old kvm_io_bus
> structure.
Hi,
We noticed that this change introduced a regression of ~20 ms to the
first KVM_CREATE_VCPU call of a VM, which is significant for our use case.
Before the patch:
45726 14:45:32.914330 ioctl(25, KVM_CREATE_VCPU, 0) = 28 <0.000137>
45726 14:45:32.914533 ioctl(25, KVM_CREATE_VCPU, 1) = 30 <0.000046>
After the patch:
30295 14:47:08.057412 ioctl(25, KVM_CREATE_VCPU, 0) = 28 <0.025182>
30295 14:47:08.082663 ioctl(25, KVM_CREATE_VCPU, 1) = 30 <0.000031>
The reason, as I understand, it happens is call_srcu() called from
kvm_io_bus_register_dev() are adding callbacks to be called after a
normal GP, which is 10 ms with HZ=100. The subsequent
synchronize_srcu_expedited() called from kvm_swap_active_memslots()
(from KVM_CREATE_VCPU) has to wait for the normal GP to complete before
making progress. I don't fully understand why the delay is consistently
greater than 1 GP, but that's what we see across our testing scenarios.
I verified that the problem is relaxed if the GP is reduced by
configuring HZ=1000. In that case, the regression is in the order of 1 ms.
It looks like in our case we don't benefit much from the intended
optimisation as the number of device MMIO registrations is limited and
and they don't cost us much (each takes at most 16 us, but most commonly
~6 us):
firecracker 68452 [054] 3053.183991:
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
firecracker 68452 [054] 3053.184007:
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <-
ffffffffc03aa190)
firecracker 68452 [054] 3053.184007:
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
firecracker 68452 [054] 3053.184014:
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <-
ffffffffc03aa1b9)
firecracker 68452 [054] 3053.184015:
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
firecracker 68452 [054] 3053.184021:
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <-
ffffffffc03aa1db)
firecracker 68452 [054] 3053.184028:
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
firecracker 68452 [054] 3053.184034:
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <-
ffffffffc03ac957)
firecracker 68452 [054] 3053.184093:
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
firecracker 68452 [054] 3053.184099:
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <-
ffffffffc03ab51a)
firecracker 68452 [054] 3053.184100:
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
firecracker 68452 [054] 3053.184106:
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <-
ffffffffc03ab549)
firecracker 68452 [054] 3053.193145:
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
firecracker 68452 [054] 3053.193164:
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <-
ffffffffc0348c9f)
firecracker 68452 [054] 3053.193165:
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
firecracker 68452 [054] 3053.193171:
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <-
ffffffffc0348c9f)
Our env:
- 6.18
- Arch: the analysis above is from x86, but ARM regressed very similarly
- CONFIG_HZ=100
- VMM: Firecracker (https://github.com/firecracker-microvm/firecracker)
I am not aware of way to make it fast for both use cases and would be
more than happy to hear about possible solutions.
Thanks,
Nikita
>
> Tested-by: Li RongQing <lirongqing at baidu.com>
> Signed-off-by: Keir Fraser <keirf at google.com>
> ---
> include/linux/kvm_host.h | 1 +
> virt/kvm/kvm_main.c | 11 +++++++++--
> 2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e7d6111cf254..103be35caf0d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -206,6 +206,7 @@ struct kvm_io_range {
> struct kvm_io_bus {
> int dev_count;
> int ioeventfd_count;
> + struct rcu_head rcu;
> struct kvm_io_range range[];
> };
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 870ad8ea93a7..bcef324ccbf2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1320,6 +1320,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
> kvm_free_memslots(kvm, &kvm->__memslots[i][1]);
> }
> cleanup_srcu_struct(&kvm->irq_srcu);
> + srcu_barrier(&kvm->srcu);
> cleanup_srcu_struct(&kvm->srcu);
> #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> xa_destroy(&kvm->mem_attr_array);
> @@ -5952,6 +5953,13 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> }
> EXPORT_SYMBOL_GPL(kvm_io_bus_read);
>
> +static void __free_bus(struct rcu_head *rcu)
> +{
> + struct kvm_io_bus *bus = container_of(rcu, struct kvm_io_bus, rcu);
> +
> + kfree(bus);
> +}
> +
> int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
> int len, struct kvm_io_device *dev)
> {
> @@ -5990,8 +5998,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
> memcpy(new_bus->range + i + 1, bus->range + i,
> (bus->dev_count - i) * sizeof(struct kvm_io_range));
> rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
> - synchronize_srcu_expedited(&kvm->srcu);
> - kfree(bus);
> + call_srcu(&kvm->srcu, &bus->rcu, __free_bus);
>
> return 0;
> }
More information about the linux-arm-kernel
mailing list