[PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table

Will Deacon will at kernel.org
Wed Sep 2 07:46:17 EDT 2020


On Tue, Sep 01, 2020 at 05:24:58PM +0100, Alexandru Elisei wrote:
> On 8/25/20 10:39 AM, Will Deacon wrote:
> > Add stage-2 map() and unmap() operations to the generic page-table code.
> >
> > Cc: Marc Zyngier <maz at kernel.org>
> > Cc: Quentin Perret <qperret at google.com>
> > Signed-off-by: Will Deacon <will at kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h |  39 ++++
> >  arch/arm64/kvm/hyp/pgtable.c         | 262 +++++++++++++++++++++++++++
> >  2 files changed, 301 insertions(+)

[...]

> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index b8550ccaef4d..41ee8f3c0369 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -32,10 +32,19 @@
> >  #define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS	3
> >  #define KVM_PTE_LEAF_ATTR_LO_S1_AF	BIT(10)
> >  
> > +#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR	GENMASK(5, 2)
> > +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R	BIT(6)
> > +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W	BIT(7)
> > +#define KVM_PTE_LEAF_ATTR_LO_S2_SH	GENMASK(9, 8)
> > +#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS	3
> > +#define KVM_PTE_LEAF_ATTR_LO_S2_AF	BIT(10)
> > +
> >  #define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
> >  
> >  #define KVM_PTE_LEAF_ATTR_HI_S1_XN	BIT(54)
> >  
> > +#define KVM_PTE_LEAF_ATTR_HI_S2_XN	BIT(54)
> 
> Checked the bitfields against ARM DDI 0487F.b, they match.

Phew! ;)

> > +static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > +				       kvm_pte_t *ptep,
> > +				       struct stage2_map_data *data)
> > +{
> > +	u64 granule = kvm_granule_size(level), phys = data->phys;
> > +
> > +	if (!kvm_block_mapping_supported(addr, end, phys, level))
> > +		return false;
> > +
> > +	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> > +		goto out;
> > +
> > +	kvm_set_invalid_pte(ptep);
> > +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > +	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> 
> One has to read the kvm_set_valid_leaf_pte code very carefully to understand why
> we're doing the above (found an old, valid entry in the stage 2 code, the page
> tables are in use so we're doing break-before-make to replace it with the new
> one), especially since we don't this with the hyp tables. Perhaps a comment
> explaining what's happening would be useful.

Sure, I can add something here, but it sounds like you figured it out.

> > +static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > +			     enum kvm_pgtable_walk_flags flag, void * const arg)
> > +{
> > +	struct stage2_map_data *data = arg;
> > +
> > +	switch (flag) {
> > +	case KVM_PGTABLE_WALK_TABLE_PRE:
> > +		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
> > +	case KVM_PGTABLE_WALK_LEAF:
> > +		return stage2_map_walk_leaf(addr, end, level, ptep, data);
> > +	case KVM_PGTABLE_WALK_TABLE_POST:
> > +		return stage2_map_walk_table_post(addr, end, level, ptep, data);
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> 
> As I understood the algorithm, each of the pre, leaf and post function do two
> different things: 1. free/invalidate the tables/leaf entries if we can create a
> block mapping at a previously visited level (stage2_map_data->anchor != NULL); and
> create an entry for the range at the correct level. To be honest, to me this
> hasn't been obvious from the code and I think some comments to the functions and
> especially to the anchor field of stage2_map_data would go a long way to making it
> easier for others to understand the code.

I can also add something here as the anchor thing is quite unusual. We
basically use it to mark an existing table entry which we want to replace
with a block entry, but before we can do that we have to descend into the
page table under that table entry freeing everything as we go. Then we'll
see the marked entry on the way back up and install the block entry then.

I had a few goes at implementing this with only LEAF and TABLE_POST but
it was really ugly, and actually it turns out TABLE_PRE is really useful
for debugging if you just want to print out the page-table.

> With that in mind, the functions look solid to me: every get_page has a
> corresponding put_page in stage2_map_walk_leaf or in the unmap walker, and the
> algorithm looks sound. I still want to re-read the functions a few times (probably
> in the next iteration) because they're definitely not trivial and I don't want to
> miss something.

Thanks. I'll post a v4 with some comments, so maybe that will help.

> > +	/*
> > +	 * This is similar to the map() path in that we unmap the entire
> > +	 * block entry and rely on the remaining portions being faulted
> > +	 * back lazily.
> > +	 */
> > +	kvm_set_invalid_pte(ptep);
> > +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
> > +	put_page(virt_to_page(ptep));
> > +
> > +	if (need_flush) {
> > +		stage2_flush_dcache(kvm_pte_follow(pte),
> > +				    kvm_granule_size(level));
> > +	}
> 
> The curly braces are unnecessary; I'm only mentioning it because you don't use
> them in this function for the rest of the one line if statements.

Hmm, but this is a two-line statement so I think it reads better.

Will



More information about the linux-arm-kernel mailing list