[PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function

Will Deacon will at kernel.org
Wed Mar 12 08:29:04 PDT 2025


On Tue, Mar 04, 2025 at 12:33:27PM +0000, Fuad Tabba wrote:
> On Mon, 3 Mar 2025 at 21:49, Will Deacon <will at kernel.org> wrote:
> >
> > On Mon, Mar 03, 2025 at 07:21:33PM +0000, Fuad Tabba wrote:
> > > On Mon, 3 Mar 2025 at 19:18, Will Deacon <will at kernel.org> wrote:
> > > > On Mon, Mar 03, 2025 at 07:57:00AM +0000, Fuad Tabba wrote:
> > > > > On Fri, 28 Feb 2025 at 19:44, Quentin Perret <qperret at google.com> wrote:
> > > > > > On Wednesday 26 Feb 2025 at 21:55:19 (+0000), Fuad Tabba wrote:
> > > > > > >  static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > > > >  {
> > > > > > > -     size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz;
> > > > > > > +     size_t pgd_sz, hyp_vm_sz;
> > > > > > >       struct kvm_vcpu *host_vcpu;
> > > > > > > -     pkvm_handle_t handle;
> > > > > > >       void *pgd, *hyp_vm;
> > > > > > >       unsigned long idx;
> > > > > > >       int ret;
> > > > > > > @@ -161,33 +178,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > > > >       if (ret < 0)
> > > > > > >               goto free_vm;
> > > > > > >
> > > > > > > -     handle = ret;
> > > > > > > +     WRITE_ONCE(host_kvm->arch.pkvm.handle, ret);
> > > > > >
> > > > > > What's the reason to make this a WRITE_ONCE? Does it mean we should
> > > > > > update the readers to be READ_ONCE()?
> > > > >
> > > > > I don't remember the original reason, to be honest. In this case, it
> > > > > was to make it consistent with downstream code in Android. That said,
> > > > > I plan on revising all of these soon and fixing this (and related
> > > > > code) in light of Will's comment regarding potential specter gadgets:
> > > > >
> > > > > https://lore.kernel.org/all/20250218092705.GA17030@willie-the-truck/
> > > >
> > > > I'm not sure the spectre stuff changes the concurrency aspects here, so
> > > > Quentin's question presumably still stands even after that.
> > > >
> > > > Looking at the Android code, this WRITE_ONCE() pairs with a READ_ONCE()
> > > > in pkvm_is_hyp_created() which is called without the config_lock held
> > > > by kvm_arch_prepare_memory_region(). However, given that
> > > > pkvm_is_hyp_created() is only testing against 0, I don't think the
> > > > _ONCE() accessors are doing anything useful in that case.
> > > >
> > > > The more confusing stuff is where 'kvm->arch.pkvm.handle' is read
> > > > directly without the 'config_lock' held. The MMU notifiers look like
> > > > they can do that, so I wonder if there's a theoretical race where they
> > > > can race with first run and issue TLB invalidation with the wrong handle?
> > > > That would apply equally to the upstream code, I think.
> > >
> > > Would using _ONCE accessors with the MMU notifiers be enough to avoid
> > > the race, or do we need to reconsider the lock protecting the handle
> > > and apply it to the notifiers?
> >
> > I'm not entirely sure... if we used _ONCE() then we'd get either the
> > correct handle or zero, but we presumably need to order that against the
> > page-table somehow. The 'mmu_lock' looks like it gives us that, but I
> > don't think the notifiers are expecting an uninitialised handle in the
> > case where the page-table is empty (i.e. if they fire before first run).
> >
> > Given that the handle is necessary for TLB invalidation, I'd be inclined
> > to make sure that the handle is allocated and published _before_ the
> > kvm_pgtable pointer checked by the notifiers is set, but that means
> > moving the handle allocation into kvm_arch_init_vm(). Is that do-able?
> 
> It is doable, but it won't be a simple patch. Up until now, the
> creation of the hyp view of a vm is associated with the first run of
> the vm, rather than its allocation at the host. Part of the reason for
> that is that much of the vm state (mainly the vcpu state) isn't
> finalized until then.
> 
> This patch series fixes this in part, by decoupling the initialization
> of the individual vcpus from the vm. Because of that, it would be
> doable, but it might be better as another patch series that we could
> test and analyze separately.
> 
> If you agree, I could whip up something and send it soon.

Yeah, I think we should fix this as a separate series and drop the
WRITE_ONCE() from this patch.

Will



More information about the linux-arm-kernel mailing list