[PATCH 2/5] KVM: arm64: Build MPIDR to vcpu index cache at runtime
Marc Zyngier
maz at kernel.org
Thu Sep 7 11:15:38 PDT 2023
On Thu, 07 Sep 2023 16:29:18 +0100,
Joey Gouly <joey.gouly at arm.com> wrote:
>
> On Thu, Sep 07, 2023 at 11:09:28AM +0100, Marc Zyngier wrote:
[...]
> > @@ -578,6 +579,57 @@ static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
> > return vcpu_get_flag(vcpu, VCPU_INITIALIZED);
> > }
> >
> > +static void kvm_init_mpidr_data(struct kvm *kvm)
> > +{
> > + struct kvm_mpidr_data *data = NULL;
> > + unsigned long c, mask, nr_entries;
> > + u64 aff_set = 0, aff_clr = ~0UL;
> > + struct kvm_vcpu *vcpu;
> > +
> > + mutex_lock(&kvm->arch.config_lock);
> > +
> > + if (kvm->arch.mpidr_data || atomic_read(&kvm->online_vcpus) == 1)
> > + goto out;
> > +
> > + kvm_for_each_vcpu(c, vcpu, kvm) {
> > + u64 aff = kvm_vcpu_get_mpidr_aff(vcpu);
> > + aff_set |= aff;
> > + aff_clr &= aff;
> > + }
> > +
> > + /*
> > + * A significant bit can be either 0 or 1, and will only appear in
> > + * aff_set. Use aff_clr to weed out the useless stuff.
> > + */
> > + mask = aff_set ^ aff_clr;
> > + nr_entries = BIT_ULL(hweight_long(mask));
> > +
> > + /*
> > + * Don't let userspace fool us. If we need more than a single page
> > + * to describe the compressed MPIDR array, just fall back to the
> > + * iterative method. Single vcpu VMs do not need this either.
> > + */
> > + if (struct_size(data, cmpidr_to_idx, nr_entries) <= PAGE_SIZE)
> > + data = kzalloc(struct_size(data, cmpidr_to_idx, nr_entries),
> > + GFP_KERNEL_ACCOUNT);
> > +
> > + if (!data)
> > + goto out;
>
> Probably not a big deal, but if the data doesn't fit, every vCPU will run this
> function up until this point (if the data fits or there's only 1 vCPU we bail
> out earlier)
Yeah, I thought about that when writing this code, and applied the
following reasoning:
- this code is only run once per vcpu
- being able to remember that we cannot allocate the hash table
requires at least an extra flag or a special value for the pointer
- this sequence is pretty quick (one read/or/and * nr_vcpu^2), and
even if you have 512 vcpus, it isn't *that* much stuff given that it
is spread across vcpus
Now, if someone can actually measure a significant boot-time speed-up,
I'll happily add that flag.
[...]
> Reviewed-by: Joey Gouly <joey.gouly at arm.com>
Thanks!
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list