[LSF/MM/BPF TOPIC] Memory fragmentation with large block sizes
Matthew Wilcox
willy at infradead.org
Fri May 1 07:33:46 PDT 2026
On Thu, Feb 19, 2026 at 10:54:48AM +0100, Hannes Reinecke wrote:
> I (together with the Czech Technical University) did some experiments trying
> to measure memory fragmentation with large block sizes.
>
> Doing so raised some challenges:
>
> - How do you _generate_ memory fragmentation? The MM subsystem is
> precisely geared up to avoid it, so you would need to come up
> with some idea how to defeat it. With the help from Willy I managed
> to come up with something, but I really would like to discuss
> what would be the best option here.
> - What is acceptable memory fragmentation? Are we good enough if the
> measured fragmentation does not grow during the test runs?
> - Do we have better visibility into memory fragmentation other than
> just reading /proc/buddyinfo?
>
> And, of course, I would like to present (and discuss) the results
> of the testruns done on 4k, 8k, and 16k blocksizes.
I think that Rik's recent work is going to affect discussion of this
topic (summary: with a "small amount" of work, reliable allocation of
1GB folios is possible):
https://lore.kernel.org/linux-mm/20260430202233.111010-1-riel@surriel.com/
but another aspect to it is the recent performance problem reported by
Amazon (summary: compaction takes too long):
https://lore.kernel.org/linux-mm/20260428150240.3009-1-dipiets@amazon.it/
Anyway, I'm putting you on notice that I may hijack this session to talk
about how GFP flags suck. I may even have a proposal for a replacement,
depending how inspired I am over the next few days.
I still think this discussion is useful because we wouldn't want an
attacker to be able to make Linux unreliable. So it's useful to think
about how userspace can make memory unreclaimable and if large folios
make the problem worse in any meaningful way.
More information about the Linux-nvme
mailing list