[PATCH 00/19] mm: Support huge pfnmaps

Wed Aug 14 16:36:03 PDT 2024

On Wed, Aug 14, 2024 at 07:10:31PM -0300, Jason Gunthorpe wrote:

[...]

> > Nope.  KVM ARM does (see get_vma_page_shift()) but I strongly suspect that's only
> > a win in very select use cases, and is overall a non-trivial loss.  
> 
> Ah that ARM behavior was probably what was being mentioned then! So
> take my original remark as applying to this :)
> 
> > > I don't quite understand your safety argument, if the VMA has 1G of
> > > contiguous physical memory described with 4K it is definitely safe for
> > > KVM to reassemble that same memory and represent it as 1G.
> >
> > That would require taking mmap_lock to get the VMA, which would be a net negative,
> > especially for workloads that are latency sensitive.
> 
> You can aggregate if the read and aggregating logic are protected by
> mmu notifiers, I think. A invalidation would still have enough
> information to clear the aggregate shadow entry. If you get a sequence
> number collision then you'd throw away the aggregation.
> 
> But yes, I also think it would be slow to have aggregation logic in
> KVM. Doing in the main mmu is much better.

+1.

For KVM/arm64 I'm quite hesitant to change the behavior to PTE mappings
in this situation (i.e. dump get_vma_page_shift()), as I'm quite certain
that'll have a performance regression on someone's workload. But once we
can derive huge PFNMAP from the primary MMU then we should just normalize
on that.

-- 
Thanks,
Oliver