[LSF/MM/BPF TOPIC] State Of The Page

Matthew Wilcox willy at infradead.org
Sun Jan 21 15:54:19 PST 2024


On Sun, Jan 21, 2024 at 06:31:48PM -0500, Pasha Tatashin wrote:
> On Sun, Jan 21, 2024 at 6:14 PM Matthew Wilcox <willy at infradead.org> wrote:
> > I can add a proposal for a topic on both the PCP and Buddy allocators
> > (I have a series of Thoughts on how the PCP allocator works in a memdesc
> > world that I haven't written down & sent out yet).
> 
> Interesting, given that pcp are mostly allocated by kmalloc and use
> vmalloc for large allocations, how memdesc can be different for them
> compared to regular kmalloc allocations given that they are sub-page?

Oh!  I don't mean the mm/percpu.c allocator.  I mean the pcp allocator
in mm/page_alloc.c.

I don't have any Thoughts on mm/percpu.c at this time.  I'm vaguely
aware that it exists ;-)

> > Thee's so much work to be done!  And it's mostly parallelisable and almost
> > trivial.  It's just largely on the filesystem-page cache interaction, so
> > it's not terribly interesting.  See, for example, the ext2, ext4, gfs2,
> > nilfs2, ufs and ubifs patchsets I've done over the past few releases.
> > I have about half of an ntfs3 patchset ready to send.
> 
> > There's a bunch of work to be done in DRM to switch from pages to folios
> > due to their use of shmem.  You can also grep for 'page->mapping' (because
> > fortunately we aren't too imaginative when it comes to naming variables)
> > and find 270 places that need to be changed.  Some are comments, but
> > those still need to be updated!
> >
> > Anything using lock_page(), get_page(), set_page_dirty(), using
> > &folio->page, any of the functions in mm/folio-compat.c needs auditing.
> > We can make the first three of those work, but they're good indicators
> > that the code needs to be looked at.
> >
> > There is some interesting work to be done, and one of the things I'm
> > thinking hard about right now is how we're doing folio conversions
> > that make sense with today's code, and stop making sense when we get
> > to memdescs.  That doesn't apply to anything interacting with the page
> > cache (because those are folios now and in the future), but it does apply
> > to one spot in ext4 where it allocates memory from slab and attaches a
> > buffer_head to it ...
> 
> There are many more drivers that would need the conversion. For
> example, IOMMU page tables can occupy gigabytes of space, have
> different implementations for AMD, X86, and several ARMs. Conversion
> to memdesc and unifying the IO page table management implementation
> for these platforms would be beneficial.

Understood; there's a lot of code that can benefit from larger
allocations.  I was listing the impediments to shrinking struct page
rather than the places which would most benefit from switching to larger
allocations.  They're complementary to a large extent; you can switch
to compound allocations today and get the benefit later.  And unifying
implementations is always a worthy project.



More information about the Linux-nvme mailing list