[PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2
Christoffer Dall
christoffer.dall at linaro.org
Thu Oct 9 04:01:37 PDT 2014
On Wed, Oct 08, 2014 at 10:47:04AM +0100, Catalin Marinas wrote:
> On Tue, Oct 07, 2014 at 08:39:54PM +0100, Christoffer Dall wrote:
> > I came up with the following based on your feedback, but I personally
> > don't find it a lot easier to read than what I had already. Suggestions
> > are welcome:
>
> At least PTRS_PER_S2_PGD and KVM_PREALLOC_LEVEL are clearer to me as
> formulas than the magic numbers.
>
Agreed.
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index a030d16..7941a51 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> [...]
> > +/*
> > + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address
> > + * the entire IPA input range with a single pgd entry, and we would only need
> > + * one pgd entry.
> > + */
>
> It may be worth here stating that this pgd is actually fake (covered
> below as well). Maybe something like "single (fake) pgd entry".
>
Yes.
> > +#if PGDIR_SHIFT > KVM_PHYS_SHIFT
> > +#define PTRS_PER_S2_PGD (1)
> > +#else
> > +#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
> > +#endif
> > +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
> >
> > +/*
> > + * If we are concatenating first level stage-2 page tables, we would have less
> > + * than or equal to 16 pointers in the fake PGD, because that's what the
> > + * architecture allows. In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS)
> > + * represents the first level for the host, and we add 1 to go to the next
> > + * level (which uses contatenation) for the stage-2 tables.
> > + */
> > +#if PTRS_PER_S2_PGD <= 16
> > +#define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1)
> > +#else
> > +#define KVM_PREALLOC_LEVEL (0)
> > +#endif
> > +
> > +/**
> > + * kvm_prealloc_hwpgd - allocate inital table for VTTBR
> > + * @kvm: The KVM struct pointer for the VM.
> > + * @pgd: The kernel pseudo pgd
> > + *
> > + * When the kernel uses more levels of page tables than the guest, we allocate
> > + * a fake PGD and pre-populate it to point to the next-level page table, which
> > + * will be the real initial page table pointed to by the VTTBR.
> > + *
> > + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and
> > + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we
> > + * allocate 2 consecutive PUD pages.
> > + */
>
> I don't have a strong preference here, if you find the code easier to
> read as separate kvm_prealloc_hwpgd() functions, use those, as per your
> original patch. My point was to no longer define the functions based on
> #if 64K && 3-levels etc. but only on KVM_PREALLOC_LEVEL.
>
> Anyway, I think the code below looks ok, with some fixes.
>
I think it's nicer too once I got used to it.
> > +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
> > +{
> > + pud_t *pud;
> > + pmd_t *pmd;
> > + unsigned int order, i;
> > + unsigned long hwpgd;
> > +
> > + if (KVM_PREALLOC_LEVEL == 0)
> > + return 0;
> > +
> > + order = get_order(PTRS_PER_S2_PGD);
>
> Isn't order always 0 here? Based on our IRC discussion, PTRS_PER_S2_PGD
> is 16 or less and the order should not be used.
>
no, if the kernel has 4K pages and 4 levels, then PGDIR_SHIFT is 39, and
KVM_PHYS_SHIFT stays 40, so that means PTRS_PER_S2_PGD becomes 2, which
means we concatenate two first level stage-2 page tables, which means we
need to allocate two consecutive pages, giving us an order of 1, not 0.
That's exactly why we use get_order(PTRS_PER_S2_PGD) instead of
S2_PGD_ORDER, which is only used when we're not doing the fake PGD trick
(see my response to Marc's mail).
> > + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
>
> I assume you need __get_free_pages() for alignment.
>
yes, would you prefer a comment to that fact?
> > + if (!hwpgd)
> > + return -ENOMEM;
> > +
> > + if (KVM_PREALLOC_LEVEL == 1) {
> > + pud = (pud_t *)hwpgd;
> > + for (i = 0; i < PTRS_PER_S2_PGD; i++)
> > + pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD);
> > + } else if (KVM_PREALLOC_LEVEL == 2) {
> > + pud = pud_offset(pgd, 0);
> > + pmd = (pmd_t *)hwpgd;
> > + for (i = 0; i < PTRS_PER_S2_PGD; i++)
> > + pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD);
> > + }
>
> It could be slightly shorter as (I can't guarantee clearer ;)):
>
> for (i = 0; i < PTRS_PER_S2_PGD; i++) {
> if (KVM_PREALLOC_LEVEL == 1)
> pgd_populate(NULL, pgd + i,
> (pud_t *)hwpgd + i * PTRS_PER_PUD);
> else if (KVM_PREALLOC_LEVEL == 2)
> pud_populate(NULL, pud_offset(pgd, 0) + i,
> (pmd_t *)hwpgd + i * PTRS_PER_PMD)
> }
>
> Or you could write a kvm_populate_swpgd() to handle the ifs and casting.
>
I actually quite like this, let's see how it looks in the next revision
and if people really dislike it, we can look at factoring it out
further.
> > +
> > + return 0;
> > +}
> > +
> > +static inline void *kvm_get_hwpgd(struct kvm *kvm)
> > +{
> > + pgd_t *pgd = kvm->arch.pgd;
> > + pud_t *pud;
> > + pmd_t *pmd;
> > +
> > + switch (KVM_PREALLOC_LEVEL) {
> > + case 0:
> > + return pgd;
> > + case 1:
> > + pud = pud_offset(pgd, 0);
> > + return pud;
> > + case 2:
> > + pud = pud_offset(pgd, 0);
> > + pmd = pmd_offset(pud, 0);
> > + return pmd;
> > + default:
> > + BUG();
> > + return NULL;
> > + }
>
> /* not needed? Use BUG_ON or BUILD_BUG_ON */
> if (KVM_PREALLOC_LEVEL == 0)
> return pgd;
>
> pud = pud_offset(pgd, 0);
> if (KVM_PREALLOC_LEVEL == 1)
> return pud;
>
> return pmd_offset(pud, 0);
I like this, but...
>
> You don't need KVM_PREALLOC_LEVEL == 0 case since this function wouldn't
> be called. So you could do with some (BUILD_)BUG_ON and 4 lines after.
>
It is needed and it is called from arch/arm/kvm/arm.c in update_vttbr().
Thanks!
-Christoffer
More information about the linux-arm-kernel
mailing list