[PATCH v5 00/27] Memory Folios

Matthew Wilcox willy at infradead.org
Thu Apr 1 13:07:02 BST 2021


On Thu, Apr 01, 2021 at 05:05:37AM +0000, Al Viro wrote:
> On Tue, Mar 30, 2021 at 10:09:29PM +0100, Matthew Wilcox wrote:
> 
> > That's a very Intel-centric way of looking at it.  Other architectures
> > support a multitude of page sizes, from the insane ia64 (4k, 8k, 16k, then
> > every power of four up to 4GB) to more reasonable options like (4k, 32k,
> > 256k, 2M, 16M, 128M).  But we (in software) shouldn't constrain ourselves
> > to thinking in terms of what the hardware currently supports.  Google
> > have data showing that for their workloads, 32kB is the goldilocks size.
> > I'm sure for some workloads, it's much higher and for others it's lower.
> > But for almost no workload is 4kB the right choice any more, and probably
> > hasn't been since the late 90s.
> 
> Out of curiosity I looked at the distribution of file sizes in the
> kernel tree:
> 71455 files total
> 0--4Kb		36702
> 4--8Kb		11820
> 8--16Kb		10066
> 16--32Kb	6984
> 32--64Kb	3804
> 64--128Kb	1498
> 128--256Kb	393
> 256--512Kb	108
> 512Kb--1Mb	35
> 1--2Mb		25
> 2--4Mb		5
> 4--6Mb		7
> 6--8Mb		4
> 12Mb		2 
> 14Mb		1
> 16Mb		1
> 
> ... incidentally, everything bigger than 1.2Mb lives^Wshambles under
> drivers/gpu/drm/amd/include/asic_reg/

I'm just going to edit this table to add a column indicating ratio
to previous size:

> Page size	Footprint
> 4Kb		1128Mb
> 8Kb		1324Mb		1.17
> 16Kb		1764Mb		1.33
> 32Kb		2739Mb		1.55
> 64Kb		4832Mb		1.76
> 128Kb		9191Mb		1.90
> 256Kb		18062Mb		1.96
> 512Kb		35883Mb		1.98
> 1Mb		71570Mb		1.994
> 2Mb		142958Mb	1.997
> 
> So for kernel builds (as well as grep over the tree, etc.) uniform 2Mb pages
> would be... interesting.

Yep, that's why I opted for a "start out slowly and let readahead tell me
when to increase the page size" approach.

I think Johannes' real problem is that slab and page cache / anon pages
are getting intermingled.  We could solve this by having slab allocate
2MB pages from the page allocator and then split them up internally
(so not all of that 2MB necessarily goes to a single slab cache, but all
of that 2MB goes to some slab cache).



More information about the linux-afs mailing list