[Lsf-pc] [LSF/MM/BPF TOPIC] Memory fragmentation with large block sizes

Tue Jun 9 02:37:54 PDT 2026

On 6/9/26 10:39, Hannes Reinecke wrote:
> On 6/9/26 09:28, Christoph Hellwig wrote:
>> Hannes,
>> 
>> can you share your results on the mailing list?
>> 
> I sure can.
> 
> We have run a simple testcase with on fio job on an LBS-enabled device, 
> and another job permanently allocating and deallocating arrays of pages
> of various array lengths.
> 
> We then took snapshots of /proc/buddyinfo to track memory pressure
> over time.
> 
> Results are visualized in the attach plot.
> 
> With 4k block sizes we have seen a high number of 0- and 1- order pages,
> and then the expected decline towards higher orders.
> 
> With 8k and 16k block sizes a noticeable 'bump' in free pages was 
> developing in 2- and 3- order pages, which we think is down to 
> compaction trying to merge pages together.
> The number of 0- order pages increased slightly, but only half of the
> maximum number of pages in the 'bump'.
> 
> With 32k block sizes the picture changed completely; the 'bump'
> vanished, and there was only pronounces spike with 0-order pages
> (about four times the size of the spike with 4k block sizes).
> 
> This led me to assume that compaction broke down at 32k block sizes;
> this assumption was confirmed by Vlastimil Babka who pointed out that
> there is a maximum order to which page compaction is attempted:

Yep, but after the LSF/MM session I've realized I made an off-by-one error
thanks to the misleading name of the define.

> include/linux/mmzone.h:
> /*
>   * PAGE_ALLOC_COSTLY_ORDER is the order at which allocations are deemed
>   * costly to service.  That is between allocation orders which should
>   * coalesce naturally under reasonable reclaim pressure and those which
>   * will not.
>   */
> #define PAGE_ALLOC_COSTLY_ORDER 3

Which is 32k

> and it's main usage is 'order > PAGE_ALLOC_COSTLY_ORDER'.

And indeed, this means any changes in behavior due to this should only
happen at 64k (order 4) or more. The value of 3 is in fact something like
"PAGE_ALLOC_MAX_CHEAP_ORDER". I'd send a rename patch (Kiryl fixed a similar
off-by-one gotcha with MAX_ORDER), but I suspect we'll be making more
involved changes here so I'd wait for that first.

> Which ties in directly with what we're seeing.

So it's probably not that straigtforward. We should investigate more first?

> It will probably make sense to align the maximum block size which we
> currently support (ie 64k) with this value to ensure that compaction
> works with larger block sizes. Or maybe even the other way round;
> tie the maximum block size which we support to PAGE_ALLOC_COSTLY_ORDER.
> But that would mean to restrict the blocksize to 16k, whereas xfs
> works happily with 32k. So we might want to raise PAGE_ALLOC_COSTLY_ORDER.
> 
> Question is, though, how could we measure the impact?
> This particular value has been in since 2007 (commit 5ad333eb66ff1 
> 'lumpy reclaim V4'), and it might well be that the original
> reasoning doesn't apply anymore.
> 
> At the same time, this value is tied to a _LOT_ of things
> (not to mention the page allocator itself), so increasing it
> to '4' has an extremely high chance of impacting mm performance.
> 
> I'll probably run mmtests and see what I get.
> 
> Cheers,
> 
> Hannes
> 
> 
> fragmentation.png
> 
> 
> _______________________________________________
> Lsf-pc mailing list
> Lsf-pc at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lsf-pc