tlbi va, vaa vs. val, vaal

Mon Mar 2 08:23:37 PST 2015

On Fri, Feb 27, 2015 at 01:15:57PM -0800, Mario Smarduch wrote:
> On 02/27/2015 02:24 AM, Will Deacon wrote:
> > On Fri, Feb 27, 2015 at 12:12:32AM +0000, Mario Smarduch wrote:
> >> I noticed kernel tlbflush.h use tlbi va*, vaa* variants instead of
> >> val, vaal ones. Reading the manual D.5.7.2 it appears that
> >> va*, vaa* versions invalidate intermediate caching of
> >> translation structures.
> >>
> >> With stage2 enabled that may result in 20+ memory lookups
> >> for a 4 level page table walk. That's assuming that intermediate
> >> caching structures cache mappings from stage1 table entry to
> >> host page.
> > 
> > Yeah, Catalin and I discussed improving the kernel support for this,
> > but it requires some changes to the generic mmu_gather code so that we
> > can distinguish the leaf cases. I'd also like to see that done in a way
> > that takes into account different granule sizes (we currently iterate
> > over huge pages in 4k chunks). Last time I touched that, I entered a
> > world of pain and don't plan to return there immediately :)
> > 
> > Catalin -- feeling brave?
> > 
> > FWIW: the new IOMMU page-table stuff I just got merged *does* make use
> > of leaf-invalidation for the SMMU.
> 
>   thanks for the background. I'm guessing how much of PTWalk
> is cached is implementation dependent. One old paper quotes upto 40%
> improvement for some industry benchmarks that cache all stage1/2 PTWalk
> entries.

Is it caching in the TLB or in the level 1 CPU cache?

I would indeed expect some improvement without many drawbacks. The only
thing we need in Linux is to distinguish between leaf TLBI and TLBI for
page table tearing down. It's not complicated, it just needs some
testing (strangely enough, I tried to replace all user TLBI with the L
variants on a Juno board and no signs of any crashes).

-- 
Catalin