[PATCH v4 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure

Andrew Scull ascull at google.com
Fri Sep 11 07:22:16 EDT 2020


On Fri, Sep 11, 2020 at 11:15:04AM +0100, Will Deacon wrote:
> On Thu, Sep 10, 2020 at 03:21:59PM +0100, Andrew Scull wrote:
> > On Thu, Sep 10, 2020 at 01:37:13PM +0100, Will Deacon wrote:
> > > On Wed, Sep 09, 2020 at 04:29:26PM +0100, Alexandru Elisei wrote:
> > > > On 9/7/20 4:23 PM, Will Deacon wrote:
> > > > > [..]
> > > > > +
> > > > > +int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > > > > +		     struct kvm_pgtable_walker *walker)
> > > > > +{
> > > > > +	struct kvm_pgtable_walk_data walk_data = {
> > > > > +		.pgt	= pgt,
> > > > > +		.addr	= ALIGN_DOWN(addr, PAGE_SIZE),
> > > > > +		.end	= PAGE_ALIGN(walk_data.addr + size),
> > > > > +		.walker	= walker,
> > > > > +	};
> > > > 
> > > > If the caller wants to walk [0x500, 0x1500), for PAGE_SIZE = 0x1000 (4K), the
> > > > function walks the range [0x0, 0x1000). Is that intentional?
> > > 
> > > Yes, although the caller specifies a base and a *size*, rather than an end
> > > address. As a concrete example, much of the hypervisor stage-1 mapping
> > > is created using PAGE_SIZE mappings of random ELF symbols, which correspond
> > > to arbitrary addresses. In these cases, we really do want to round-down the
> > > address and perform a PAGE_SIZE mapping.
> > 
> > I think Alexandru has a point here. Turning his example into something
> > equivalent that maps a random ELF symbol:
> > 
> >     struct some_hyp_state s = { ... };
> >     // &s == 0x500
> >     // sizeof(s) == PAGE_SIZE
> >     kvm_pgtable_walk(&s, sizeof(s), walker);
> > 
> > Given `s` straddles the two pages, the part in the second page won't be
> > mapped.
> > 
> > Should the end address instead be calculated as:
> > 
> >    .end = PAGE_ALIGN(addr + size),
> 
> Cheers for the example, and I see what you mean about structures that
> straddle a page boundary. However, I think it's important here that the size
> parameter accurately reflects the number of pages mapped: if the caller
> passes PAGE_SIZE, we better not map more than a page, since the mmu cache
> might not have enough pre-allocated pages for that.
> 
> So the API is just that the virtual address bits corresponding to the offset
> within a page are ignored. Looking back at the code, that works out for the
> hyp mappings (it's actually the _physical_ address that is unaligned there,
> and we can just round that down), but I think I have a potential bug in the
> ioremap code if you place the GIC (v2) somewhere funky on a machine using
> 64k pages. In this case, the ioctl() handler only enforces 4k alignment,
> and so we could end up truncating the mapping in a similar case to what you
> describe above. For example, trying to place it from 60k - 68k would result
> in only the first page getting mapped.
> 
> I've fixed that in the ioremap code (diff below), and I'll update the
> kerneldoc to say that the bottom bits of the VA are ignored.

You've described kvm_pgtable_walk as taking a start page and page count
so an alternative to the comment could be defining function to reflect
that e.g.

    int kvm_pgtable_walk(struct kvm_pgtable *pgt, gfn_t start,
    			 size_t count, kvm_pgtable_walker *walker);

Up to you.

> 
> Cheers,
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 1041be1fafe4..21b70abf65a7 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -505,6 +505,9 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>                                      KVM_PGTABLE_PROT_R |
>                                      (writable ? KVM_PGTABLE_PROT_W : 0);
>  
> +       size += offset_in_page(guest_ipa);
> +       guest_ipa &= PAGE_MASK;
> +
>         for (addr = guest_ipa; addr < guest_ipa + size; addr += PAGE_SIZE) {
>                 ret = kvm_mmu_topup_memory_cache(&cache,
>                                                  kvm_mmu_cache_min_pages(kvm));
> 



More information about the linux-arm-kernel mailing list