[PATCH v2 2/2] treewide: Add the __GFP_PACKED flag to several non-DMA kmalloc() allocations

Sun Oct 30 09:43:21 PDT 2022

On Sun, Oct 30, 2022 at 10:13:49AM +0100, Christoph Hellwig wrote:
> On Sun, Oct 30, 2022 at 10:02:53AM +0100, Greg Kroah-Hartman wrote:
> > Ah, my fault, sorry, you are right.  Is there a sparse tag that just
> > means "enforce this void * type?"  I guess we could do that with something
> > like:
> > 	typedef void *dmaptr;
> > 
> > but that feels icky.
> 
> That's because it is :)  I find the concept of the DMA pointes pretty
> strange to be honest.

I've been trying to come up with what the semantics of __dma should be
but couldn't get to any clear conclusion. We can avoid the noderef part
as it doesn't make sense but it still implies a different address space.
Once I made the ptr in dma_map_single() a 'void __dma *', sparse
complained not just about the callers of this function but the callees
like is_vmalloc_addr(). So I have a suspicion we'd end up with __dma
annotations in lots of places unrelated to DMA or drivers. Then we have
structures like 'catc' with a 'ctrl_dr' that ends up in the DMA API (so
far as TO_DEVICE, so not that problematic).

The alternative is for dma_map_*() to (dynamically) verify at the point
of DMA setup rather than at the point of allocation (and bounce).

> It only affects a subset of dma (when devices are not attached in a
> DMA coherent way).  So count me in as someone who would be much more
> happy about figuring out a way to simplify bounce buffer for
> non-coherent DMA if the start and length are not aligned to the cache
> line size over any kind of special annotation all over the kernel.

That's definitely less daunting if we find the right solution. The
problem with checking the alignment of both start and length is that
there are valid cases where the start is aligned but the length isn't.
They don't need bouncing since they come from a sufficiently aligned
slab. We have a few options here:

a) Require that the dma_map_*() functions (including sg) get a size
   multiple of cache_line_size() or the static ARCH_DMA_MINALIGN.

b) Always bounce small sizes as they may have come from a small kmalloc
   cache. The alignment of both ptr and length is ignored. This assumes
   that in-struct alignments (crypto) use ARCH_DMA_MINALIGN and we can
   reduce ARCH_KMALLOC_MINALIGN independently.

c) Similar to (b) but in addition check whether the object comes from a
   small kmalloc cache, e.g. using ksize().

(a) has the advantage that it works by default even if drivers don't use
the correct alignment. We can optimise them afterwards. However, it
feels wrong to get the driver to align the length to a cache_line_size()
when it doesn't control the coherency of the bus (and always worked fine
on x86).

(b) is what I attempted on one of my branches (until I got stuck on
bouncing for iommu). A slight overhead on dma_map_*() to check the
length and we may keep a check on the start alignment (not length
though).

With (c) we can avoid some bouncing if the driver uses the right kmalloc
cache (maybe via a newly introduces dma_kmalloc()) but such check would
add a bit of overhead (and I'm not sure it works with slob). The
advantage is that we can avoid a bounce buffer if the drivers in
question are adapted to use dma_kmalloc().

-- 
Catalin