[PATCH 0/4] Generic IOMMU page table framework

Laurent Pinchart laurent.pinchart at ideasonboard.com
Tue Dec 2 05:47:41 PST 2014


Hi Will,

On Monday 01 December 2014 12:05:34 Will Deacon wrote:
> On Sun, Nov 30, 2014 at 10:03:08PM +0000, Laurent Pinchart wrote:
> > On Thursday 27 November 2014 11:51:14 Will Deacon wrote:
> > > Hi all,
> > > 
> > > This series introduces a generic IOMMU page table allocation framework,
> > > implements support for ARM long-descriptors and then ports the arm-smmu
> > > driver over to the new code.
> > > 
> > > There are a few reasons for doing this:
> > >   - Page table code is hard, and I don't enjoy shopping
> > >
> > >   - A number of IOMMUs actually use the same table format, but currently
> > >     duplicate the code
> > >
> > >   - It provides a CPU (and architecture) independent allocator, which
> > >     may be useful for some systems where the CPU is using a different
> > >     table format for its own mappings
> > >
> > > As illustrated in the final patch, an IOMMU driver interacts with the
> > > allocator by passing in a configuration structure describing the
> > > input and output address ranges, the supported pages sizes and a set of
> > > ops for performing various TLB invalidation and PTE flushing routines.
> > > 
> > > The LPAE code implements support for 4k/2M/1G, 16k/32M and 64k/512M
> > > mappings, but I decided not to implement the contiguous bit in the
> > > interest of trying to keep the code semi-readable. This could always be
> > > added later, if needed.
> > 
> > Do you have any idea how much the contiguous bit can improve performances
> > in real use cases ?
> 
> It depends on the TLB, really. Given that the contiguous sized map directly
> onto block sizes using different granules, I didn't see that the complexity
> was worth it.
> 
> For example:
> 
>    4k granule : 16 contiguous entries => {64k, 32M, 16G}
>   16k granule : 128 contiguous lvl3 entries => 2M
>                 32 contiguous lvl2 entries => 1G
>   64k granule : 32 contiguous entries => {2M, 16G}
> 
> If we use block mappings, then we get:
> 
>    4k granule : 2M @ lvl2, 1G @ lvl1
>   16k granule : 32M @ lvl2
>   64k granule : 512M @ lvl2
> 
> so really, we only miss the ability to create 16G mappings.

In the general case maybe, but as far as I know my IOMMU only supports 4kB 
granule. Without support for the contiguous bit I loose the ability to create 
64kB mappings, which I believe (but haven't test yet) will have a noticeable 
impact.

> I doubt that hardware even implements that size in the TLB (the contiguous
> bit is only a hint).
>
> On top of that, the contiguous bit leads to additional expense on unmap,
> since you have extra TLB invalidation splitting the thing into non-
> contiguous pages before you can do anything.

That will only be required when doing partial unmaps, which shouldn't be that 
frequent. When unmapping a 64kB block there's no need to split the mapping 
beforehand.

-- 
Regards,

Laurent Pinchart




More information about the linux-arm-kernel mailing list