[LSF/MM/BPF TOPIC] Memory fragmentation with large block sizes

Mon Mar 16 16:26:13 PDT 2026

On 2/19/26 6:53 AM, Bart Van Assche wrote:
> On 2/19/26 1:54 AM, Hannes Reinecke wrote:
>> I (together with the Czech Technical University) did some experiments 
>> trying to measure memory fragmentation with large block sizes.
>> Testbed used was an nvme setup talking to a nvmet storage over
>> the network.
>>
>> Doing so raised some challenges:
>>
>> - How do you _generate_ memory fragmentation? The MM subsystem is
>>    precisely geared up to avoid it, so you would need to come up
>>    with some idea how to defeat it. With the help from Willy I managed
>>    to come up with something, but I really would like to discuss
>>    what would be the best option here.
>> - What is acceptable memory fragmentation? Are we good enough if the
>>    measured fragmentation does not grow during the test runs?
>> - Do we have better visibility into memory fragmentation other than
>>    just reading /proc/buddyinfo?
> 
> The larger the block size, the higher the write amplification (WAF),
> isn't it? Why to increase the block size since there is a solution
> available that doesn't increase WAF, namely zoned storage?

(replying to my own email)

The following paper shows that it is possible to achieve great
performance with filesystems like ext4 and ZNS SSDs by implementing
an FTL in software (ZTL). This could be a more interesting approach
than optimizing host software for large indirection units. See also
Sass, Jan, André Brinkmann, Matias Bjørling, Xubin He, and Reza
Salkhordeh. "ZTL: A block layer ZNS driver." Journal of Systems
Architecture (2026): 103757.
(https://www.sciencedirect.com/science/article/pii/S1383762126000755).

Bart.