[LSF/MM/BPF TOPIC] State Of The Page

Matthew Wilcox willy at infradead.org
Sun Jan 21 15:14:19 PST 2024


On Sun, Jan 21, 2024 at 01:00:40PM -0800, David Rientjes wrote:
> On Fri, 19 Jan 2024, Matthew Wilcox wrote:
> > It's probably worth doing another roundup of where we are on our journey
> > to separating folios, slabs, pages, etc.  Something suitable for people
> > who aren't MM experts, and don't care about the details of how page
> > allocation works.  I can talk for hours about whatever people want to
> > hear about but some ideas from me:
> > 
> >  - Overview of how the conversion is going
> >  - Convenience functions for filesystem writers
> >  - What's next?
> >  - What's the difference between &folio->page and page_folio(folio, 0)?
> >  - What are we going to do about bio_vecs?
> >  - How does all of this work with kmap()?
> > 
> > I'm sure people would like to suggest other questions they have that
> > aren't adequately answered already and might be of interest to a wider
> > audience.
> > 
> 
> Thanks for proposing this again, Matthew, I'd definitely like to be 
> involved in the discussion as I think a couple of my colleagues, cc'd, 
> would has well.  Memory efficiency is a top priority for 2024 and, thus, 
> getting on a pathway toward reducing the overhead of struct page is very 
> important for our hosts that are not using large amounts of 1GB hugetlb.
> 
> I've seen your other thread regarding how the page allocator can be 
> enlightened for memdesc, so I'm hoping that can either be covered in this 
> topic or a separate topic.

I'd like to keep this topic relevant to as many people as possible.
I can add a proposal for a topic on both the PCP and Buddy allocators
(I have a series of Thoughts on how the PCP allocator works in a memdesc
world that I haven't written down & sent out yet).

Or we can cover the page allocators in your biweekly meetings.  Maybe both
since not everybody can attend either the phone call or the conference.

> Especially important for us would be the division of work so that we can 
> parallelize development as much as possible for things like memdesc.  If 
> there are any areas that just haven't been investigated yet but we *know* 
> we'll need to address to get to the new world of memdesc, I think we'd 
> love to discuss that.

Thee's so much work to be done!  And it's mostly parallelisable and almost
trivial.  It's just largely on the filesystem-page cache interaction, so
it's not terribly interesting.  See, for example, the ext2, ext4, gfs2,
nilfs2, ufs and ubifs patchsets I've done over the past few releases.
I have about half of an ntfs3 patchset ready to send.

There's a bunch of work to be done in DRM to switch from pages to folios
due to their use of shmem.  You can also grep for 'page->mapping' (because
fortunately we aren't too imaginative when it comes to naming variables)
and find 270 places that need to be changed.  Some are comments, but
those still need to be updated!

Anything using lock_page(), get_page(), set_page_dirty(), using
&folio->page, any of the functions in mm/folio-compat.c needs auditing.
We can make the first three of those work, but they're good indicators
that the code needs to be looked at.

There is some interesting work to be done, and one of the things I'm
thinking hard about right now is how we're doing folio conversions
that make sense with today's code, and stop making sense when we get
to memdescs.  That doesn't apply to anything interacting with the page
cache (because those are folios now and in the future), but it does apply
to one spot in ext4 where it allocates memory from slab and attaches a
buffer_head to it ...



More information about the Linux-nvme mailing list