[PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure

Will Deacon will at kernel.org
Wed Sep 2 07:02:33 EDT 2020


Hi Gavin,

On Wed, Sep 02, 2020 at 04:31:32PM +1000, Gavin Shan wrote:
> On 8/25/20 7:39 PM, Will Deacon wrote:
> > The KVM page-table code is intricately tied into the kernel page-table
> > code and re-uses the pte/pmd/pud/p4d/pgd macros directly in an attempt
> > to reduce code duplication. Unfortunately, the reality is that there is
> > an awful lot of code required to make this work, and at the end of the
> > day you're limited to creating page-tables with the same configuration
> > as the host kernel. Furthermore, lifting the page-table code to run
> > directly at EL2 on a non-VHE system (as we plan to to do in future
> > patches) is practically impossible due to the number of dependencies it
> > has on the core kernel.
> > 
> > Introduce a framework for walking Armv8 page-tables configured
> > independently from the host kernel.
> > 
> > Cc: Marc Zyngier <maz at kernel.org>
> > Cc: Quentin Perret <qperret at google.com>
> > Signed-off-by: Will Deacon <will at kernel.org>
> > ---
> >   arch/arm64/include/asm/kvm_pgtable.h | 101 ++++++++++
> >   arch/arm64/kvm/hyp/Makefile          |   2 +-
> >   arch/arm64/kvm/hyp/pgtable.c         | 290 +++++++++++++++++++++++++++
> >   3 files changed, 392 insertions(+), 1 deletion(-)
> >   create mode 100644 arch/arm64/include/asm/kvm_pgtable.h
> >   create mode 100644 arch/arm64/kvm/hyp/pgtable.c

[...]

> > +struct kvm_pgtable_walk_data {
> > +	struct kvm_pgtable		*pgt;
> > +	struct kvm_pgtable_walker	*walker;
> > +
> > +	u64				addr;
> > +	u64				end;
> > +};
> > +
> 
> Some of the following function might be worthy to be inlined, considering
> their complexity :)

I'll leave that for the compiler to figure out :)

> > +static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
> > +{
> > +	struct kvm_pgtable pgt = {
> > +		.ia_bits	= ia_bits,
> > +		.start_level	= start_level,
> > +	};
> > +
> > +	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
> > +}
> > +
> 
> It seems @pgt.start_level is assigned with wrong value here.
> For example, @start_level is 2 when @ia_bits and PAGE_SIZE
> are 40 and 64KB separately. In this case, __kvm_pgd_page_idx()
> always return zero. However, the extra page covers up the
> issue. I think something like below might be needed:
> 
> 	struct kvm_pgtable pgt = {
> 		.ia_bits	= ia_bits,
> 		.start_level	= KVM_PGTABLE_MAX_LEVELS - start_level + 1,
> 	};

Hmm, we're pulling the start_level right out of the vtcr, so I don't see
how it can be wrong. In your example, a start_level of 2 seems correct to
me, as we'll translate 13 bits there, then 13 bits at level 3 which covers
the 24 bits you need (with a 16-bit offset within the page).

Your suggestion would give us a start_level of 1, which has a redundant
level of translation. Maybe you're looking at the levels upside-down? The
top level is level 0 and each time you walk to a new level, that number
increases.

But perhaps I'm missing something. Please could you elaborate if you think
there's a problem here?

> > +static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
> > +{
> > +	u32 idx;
> > +	int ret = 0;
> > +	struct kvm_pgtable *pgt = data->pgt;
> > +	u64 limit = BIT(pgt->ia_bits);
> > +
> > +	if (data->addr > limit || data->end > limit)
> > +		return -ERANGE;
> > +
> > +	if (!pgt->pgd)
> > +		return -EINVAL;
> > +
> > +	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
> > +		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
> > +
> > +		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> > +		if (ret)
> > +			break;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> 
> I guess we need bail on the following condition:
> 
>         if (data->addr >= limit || data->end >= limit)
>             return -ERANGE;

What's wrong with the existing check? In particular, I think we _want_
to support data->end == limit (it's exclusive). If data->addr == limit,
then we'll have a size of zero and the loop won't run.

Will



More information about the linux-arm-kernel mailing list