[PATCH 0/4] Generic IOMMU page table framework

Laurent Pinchart laurent.pinchart at ideasonboard.com
Tue Dec 2 14:29:26 PST 2014


Hi Will,

On Tuesday 02 December 2014 13:53:56 Will Deacon wrote:
> On Tue, Dec 02, 2014 at 01:47:41PM +0000, Laurent Pinchart wrote:
> > On Monday 01 December 2014 12:05:34 Will Deacon wrote:
> >> On Sun, Nov 30, 2014 at 10:03:08PM +0000, Laurent Pinchart wrote:
> >>> On Thursday 27 November 2014 11:51:14 Will Deacon wrote:
> >>>> The LPAE code implements support for 4k/2M/1G, 16k/32M and 64k/512M
> >>>> mappings, but I decided not to implement the contiguous bit in the
> >>>> interest of trying to keep the code semi-readable. This could always
> >>>> be added later, if needed.
> >>> 
> >>> Do you have any idea how much the contiguous bit can improve
> >>> performances in real use cases ?
> >> 
> >> It depends on the TLB, really. Given that the contiguous sized map
> >> directly onto block sizes using different granules, I didn't see that
> >> the complexity was worth it.
> >> 
> >> For example:
> >>    4k granule : 16 contiguous entries => {64k, 32M, 16G}
> >>   16k granule : 128 contiguous lvl3 entries => 2M
> >>                 32 contiguous lvl2 entries => 1G
> >>   64k granule : 32 contiguous entries => {2M, 16G}
> >> 
> >> If we use block mappings, then we get:
> >>    4k granule : 2M @ lvl2, 1G @ lvl1
> >>   16k granule : 32M @ lvl2
> >>   64k granule : 512M @ lvl2
> >> 
> >> so really, we only miss the ability to create 16G mappings.
> >
> > In the general case maybe, but as far as I know my IOMMU only supports 4kB
> > granule. Without support for the contiguous bit I loose the ability to
> > create 64kB mappings, which I believe (but haven't test yet) will have a
> > noticeable impact.
> 
> It would be good if you could confirm that. I'd have thought that you'd end
> up using 2MB mappings most of the time for DMA buffers.

I'll try to gather statistics as soon as I can get TLB flushing working 
reliably. Without it turning the IOMMU on kills the system pretty fast :-)

> >> I doubt that hardware even implements that size in the TLB (the
> >> contiguous bit is only a hint).
> >> 
> >> On top of that, the contiguous bit leads to additional expense on unmap,
> >> since you have extra TLB invalidation splitting the thing into non-
> >> contiguous pages before you can do anything.
> > 
> > That will only be required when doing partial unmaps, which shouldn't be
> > that frequent. When unmapping a 64kB block there's no need to split the
> > mapping beforehand.
> 
> Sure. I'm not against having support for the contiguous bit, I just don't
> plan to implement it myself :)

-- 
Regards,

Laurent Pinchart




More information about the linux-arm-kernel mailing list