[PATCH v4 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure

Fri Sep 11 06:15:04 EDT 2020

On Thu, Sep 10, 2020 at 03:21:59PM +0100, Andrew Scull wrote:
> On Thu, Sep 10, 2020 at 01:37:13PM +0100, Will Deacon wrote:
> > On Wed, Sep 09, 2020 at 04:29:26PM +0100, Alexandru Elisei wrote:
> > > On 9/7/20 4:23 PM, Will Deacon wrote:
> > > > [..]
> > > > +
> > > > +int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > > > +		     struct kvm_pgtable_walker *walker)
> > > > +{
> > > > +	struct kvm_pgtable_walk_data walk_data = {
> > > > +		.pgt	= pgt,
> > > > +		.addr	= ALIGN_DOWN(addr, PAGE_SIZE),
> > > > +		.end	= PAGE_ALIGN(walk_data.addr + size),
> > > > +		.walker	= walker,
> > > > +	};
> > > 
> > > If the caller wants to walk [0x500, 0x1500), for PAGE_SIZE = 0x1000 (4K), the
> > > function walks the range [0x0, 0x1000). Is that intentional?
> > 
> > Yes, although the caller specifies a base and a *size*, rather than an end
> > address. As a concrete example, much of the hypervisor stage-1 mapping
> > is created using PAGE_SIZE mappings of random ELF symbols, which correspond
> > to arbitrary addresses. In these cases, we really do want to round-down the
> > address and perform a PAGE_SIZE mapping.
> 
> I think Alexandru has a point here. Turning his example into something
> equivalent that maps a random ELF symbol:
> 
>     struct some_hyp_state s = { ... };
>     // &s == 0x500
>     // sizeof(s) == PAGE_SIZE
>     kvm_pgtable_walk(&s, sizeof(s), walker);
> 
> Given `s` straddles the two pages, the part in the second page won't be
> mapped.
> 
> Should the end address instead be calculated as:
> 
>    .end = PAGE_ALIGN(addr + size),

Cheers for the example, and I see what you mean about structures that
straddle a page boundary. However, I think it's important here that the size
parameter accurately reflects the number of pages mapped: if the caller
passes PAGE_SIZE, we better not map more than a page, since the mmu cache
might not have enough pre-allocated pages for that.

So the API is just that the virtual address bits corresponding to the offset
within a page are ignored. Looking back at the code, that works out for the
hyp mappings (it's actually the _physical_ address that is unaligned there,
and we can just round that down), but I think I have a potential bug in the
ioremap code if you place the GIC (v2) somewhere funky on a machine using
64k pages. In this case, the ioctl() handler only enforces 4k alignment,
and so we could end up truncating the mapping in a similar case to what you
describe above. For example, trying to place it from 60k - 68k would result
in only the first page getting mapped.

I've fixed that in the ioremap code (diff below), and I'll update the
kerneldoc to say that the bottom bits of the VA are ignored.

Cheers,

Will

--->8

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1041be1fafe4..21b70abf65a7 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -505,6 +505,9 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
                                     KVM_PGTABLE_PROT_R |
                                     (writable ? KVM_PGTABLE_PROT_W : 0);
 
+       size += offset_in_page(guest_ipa);
+       guest_ipa &= PAGE_MASK;
+
        for (addr = guest_ipa; addr < guest_ipa + size; addr += PAGE_SIZE) {
                ret = kvm_mmu_topup_memory_cache(&cache,
                                                 kvm_mmu_cache_min_pages(kvm));