[LSF/MM/BPF TOPIC] Large block for I/O
Matthew Wilcox
willy at infradead.org
Fri Dec 22 05:29:17 PST 2023
On Fri, Dec 22, 2023 at 01:29:18PM +0100, Hannes Reinecke wrote:
> And that is actually a very valid point; memory fragmentation will become an
> issue with larger block sizes.
>
> Theoretically it should be quite easily solved; just switch the memory
> subsystem to use the largest block size in the system, and run every smaller
> memory allocation via SLUB (or whatever the allocator-of-the-day
> currently is :-). Then trivially the system will never be fragmented,
> and I/O can always use large folios.
>
> However, that means to do away with alloc_page(), which is still in
> widespread use throughout the kernel. I would actually in favour of it,
> but it might be that mm people have a different view.
>
> Matthew, worth a new topic?
> Handling memory fragmentation on large block I/O systems?
I think if we're going to do that as a topic (and I'm not opposed!),
we need data. Various workloads, various block sizes, etc. Right now
people discuss this topic with "feelings" and "intuition" and I think
we need more than vibes to have a productive discussion.
My laptop (rebooted last night due to an unfortunate upgrade that left
anything accessing the sound device hanging ...):
MemTotal: 16006344 kB
MemFree: 2353108 kB
Cached: 7957552 kB
AnonPages: 4271088 kB
Slab: 654896 kB
so ~50% of my 16GB of memory is in the page cache and ~25% is anon memory.
If the page cache is all in 16kB chunks and we need to allocate order-2
folios in order to read from a file, we can find it easily by reclaiming
other order-2 folios from the page cache. We don't need to resort to
heroics like eliminating use of alloc_page().
We should eliminate use of alloc_page() across most of the kernel, but
that's a different topic and one that has not much relevance to LSF/MM
since it's drivers that need to change, not the MM ;-)
Now, other people "feel" differently. And that's cool, but we're not
going to have a productive discussion without data that shows whose
feelings represent reality and for which kinds of workloads.
More information about the Linux-nvme
mailing list