[PATCH v4 14/25] KVM: arm64: Add per-cpu fixmap infrastructure at EL2

Tue Oct 18 07:05:14 PDT 2022

Hi Mark,

Cheers for having a look.

On Tue, Oct 18, 2022 at 12:06:14PM +0100, Mark Rutland wrote:
> On Mon, Oct 17, 2022 at 12:51:58PM +0100, Will Deacon wrote:
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > index d3a3b47181de..b77215630d5c 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > @@ -14,6 +14,7 @@
> >  #include <nvhe/early_alloc.h>
> >  #include <nvhe/gfp.h>
> >  #include <nvhe/memory.h>
> > +#include <nvhe/mem_protect.h>
> >  #include <nvhe/mm.h>
> >  #include <nvhe/spinlock.h>
> >  
> > @@ -25,6 +26,12 @@ unsigned int hyp_memblock_nr;
> >  
> >  static u64 __io_map_base;
> >  
> > +struct hyp_fixmap_slot {
> > +	u64 addr;
> > +	kvm_pte_t *ptep;
> > +};
> > +static DEFINE_PER_CPU(struct hyp_fixmap_slot, fixmap_slots);
> > +
> >  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
> >  				  unsigned long phys, enum kvm_pgtable_prot prot)
> >  {
> > @@ -212,6 +219,93 @@ int hyp_map_vectors(void)
> >  	return 0;
> >  }
> >  
> > +void *hyp_fixmap_map(phys_addr_t phys)
> > +{
> > +	struct hyp_fixmap_slot *slot = this_cpu_ptr(&fixmap_slots);
> > +	kvm_pte_t pte, *ptep = slot->ptep;
> > +
> > +	pte = *ptep;
> > +	pte &= ~kvm_phys_to_pte(KVM_PHYS_INVALID);
> > +	pte |= kvm_phys_to_pte(phys) | KVM_PTE_VALID;
> > +	WRITE_ONCE(*ptep, pte);
> > +	dsb(nshst);
> > +
> > +	return (void *)slot->addr;
> > +}
> > +
> > +static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
> > +{
> > +	kvm_pte_t *ptep = slot->ptep;
> > +	u64 addr = slot->addr;
> > +
> > +	WRITE_ONCE(*ptep, *ptep & ~KVM_PTE_VALID);
> > +	dsb(nshst);
> > +	__tlbi_level(vale2, __TLBI_VADDR(addr, 0), (KVM_PGTABLE_MAX_LEVELS - 1));
> > +	dsb(nsh);
> > +	isb();
> > +}
> 
> Does each CPU have independent Stage-1 tables at EL2? i.e. each has a distinct
> root table?

No, the CPUs share the same stage-1 table at EL2.

> If the tables are shared, you need broadcast maintenance and ISH barriers here,
> or you risk the usual issues with asynchronous MMU behaviour.

Can you elaborate a bit, please? What we're trying to do is reserve a page
of VA space for each CPU, which is only ever accessed explicitly by that
CPU using a normal memory mapping. The fixmap code therefore just updates
the relevant leaf entry for the CPU on which we're running and the TLBI
is there to ensure that the new mapping takes effect.

If another CPU speculatively walks another CPU's fixmap slot, then I agree
that it could access that page after the slot had been cleared. Although
I can see theoretical security arguments around avoiding that situation,
there's a very real performance cost to broadcast invalidation that we
were hoping to avoid on this fast path.

Of course, in the likely event that I've purged "the usual issues" from
my head and we need broadcasting for _correctness_, then we'll just have
to suck it up!

Cheers,

Will