[PATCH v2 1/2] mm: slab: Introduce __GFP_PACKED for smaller kmalloc() alignments

Catalin Marinas catalin.marinas at arm.com
Wed Oct 26 02:58:18 PDT 2022


On Wed, Oct 26, 2022 at 11:49:05AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Oct 26, 2022 at 09:39:32AM +0100, Catalin Marinas wrote:
> > On Wed, Oct 26, 2022 at 08:39:04AM +0200, Greg Kroah-Hartman wrote:
> > > On Tue, Oct 25, 2022 at 09:52:46PM +0100, Catalin Marinas wrote:
> > > > By default kmalloc() returns objects aligned to ARCH_KMALLOC_MINALIGN.
> > > > This can be somewhat large on architectures defining ARCH_DMA_MINALIGN
> > > > (e.g. 128 on arm64) and significant memory is wasted through small
> > > > kmalloc() allocations.
> > > > 
> > > > Reduce the minimum alignment for kmalloc() to the default
> > > > KMALLOC_MIN_SIZE (8 for slub, 32 for slab) but align the
> > > > requested size to the bigger ARCH_KMALLOC_MINALIGN unless a newly added
> > > > __GFP_PACKED flag is passed. With this gfp flag, the alignment is
> > > > reduced to KMALLOC_PACKED_ALIGN, at least sizeof(unsigned long long).
> > > 
> > > Can memory allocated with __GFP_PACKED be sent to DMA controllers?
> > > 
> > > If not, you should say that somewhere here or I'm going to get a bunch
> > > of patches trying to add this flag to tiny USB urb allocations (where we
> > > allocate 8 or 16 bytes) that is then going to fail on some hardware.
> > 
> > Good point, I'll add a comment.
> > 
> > We can also add a check to the DMA API when debugging is enabled,
> > something like WARN_ON_ONCE(ksize(ptr) < cache_line_size()) for
> > non-coherent devices.
> 
> It's not the size of the object that matters, it's the alignment, right?
> So shouldn't the check be simpler to just look at the alignment of the
> pointer which should be almost "free"?

It's the alignment of the start (easily checked) but also the end. For
small urb allocations, we need to know that they came from a slab page
where there's nothing in the adjacent bytes before the next cache line.
It would have been nice if the DMA API was called with both start and
size aligned but that's not the case. I think such check would fail even
for larger buffers like network packets.

-- 
Catalin



More information about the linux-arm-kernel mailing list