[PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
Will Deacon
will at kernel.org
Mon Mar 3 13:49:50 PST 2025
On Mon, Mar 03, 2025 at 07:21:33PM +0000, Fuad Tabba wrote:
> On Mon, 3 Mar 2025 at 19:18, Will Deacon <will at kernel.org> wrote:
> > On Mon, Mar 03, 2025 at 07:57:00AM +0000, Fuad Tabba wrote:
> > > On Fri, 28 Feb 2025 at 19:44, Quentin Perret <qperret at google.com> wrote:
> > > > On Wednesday 26 Feb 2025 at 21:55:19 (+0000), Fuad Tabba wrote:
> > > > > static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > > {
> > > > > - size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz;
> > > > > + size_t pgd_sz, hyp_vm_sz;
> > > > > struct kvm_vcpu *host_vcpu;
> > > > > - pkvm_handle_t handle;
> > > > > void *pgd, *hyp_vm;
> > > > > unsigned long idx;
> > > > > int ret;
> > > > > @@ -161,33 +178,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > > if (ret < 0)
> > > > > goto free_vm;
> > > > >
> > > > > - handle = ret;
> > > > > + WRITE_ONCE(host_kvm->arch.pkvm.handle, ret);
> > > >
> > > > What's the reason to make this a WRITE_ONCE? Does it mean we should
> > > > update the readers to be READ_ONCE()?
> > >
> > > I don't remember the original reason, to be honest. In this case, it
> > > was to make it consistent with downstream code in Android. That said,
> > > I plan on revising all of these soon and fixing this (and related
> > > code) in light of Will's comment regarding potential specter gadgets:
> > >
> > > https://lore.kernel.org/all/20250218092705.GA17030@willie-the-truck/
> >
> > I'm not sure the spectre stuff changes the concurrency aspects here, so
> > Quentin's question presumably still stands even after that.
> >
> > Looking at the Android code, this WRITE_ONCE() pairs with a READ_ONCE()
> > in pkvm_is_hyp_created() which is called without the config_lock held
> > by kvm_arch_prepare_memory_region(). However, given that
> > pkvm_is_hyp_created() is only testing against 0, I don't think the
> > _ONCE() accessors are doing anything useful in that case.
> >
> > The more confusing stuff is where 'kvm->arch.pkvm.handle' is read
> > directly without the 'config_lock' held. The MMU notifiers look like
> > they can do that, so I wonder if there's a theoretical race where they
> > can race with first run and issue TLB invalidation with the wrong handle?
> > That would apply equally to the upstream code, I think.
>
> Would using _ONCE accessors with the MMU notifiers be enough to avoid
> the race, or do we need to reconsider the lock protecting the handle
> and apply it to the notifiers?
I'm not entirely sure... if we used _ONCE() then we'd get either the
correct handle or zero, but we presumably need to order that against the
page-table somehow. The 'mmu_lock' looks like it gives us that, but I
don't think the notifiers are expecting an uninitialised handle in the
case where the page-table is empty (i.e. if they fire before first run).
Given that the handle is necessary for TLB invalidation, I'd be inclined
to make sure that the handle is allocated and published _before_ the
kvm_pgtable pointer checked by the notifiers is set, but that means
moving the handle allocation into kvm_arch_init_vm(). Is that do-able?
Will
More information about the linux-arm-kernel
mailing list